Deloitte Analytics Senior Advisor Tom Davenport warned last year that data scientists waste too much time prepping data. After interviewing data scientists, Davenport concluded that they needed better tools for data integration and curating.
Now, a Ventana Research column shows that data scientists aren’t the only ones wasting enormous amounts of time on data preparation at the expense of actual analysis.
Ventana CEO Mark Smith shares research from several reports, all of which demonstrate how much of a time suck data preparation can be without the right tools.
For instance, Big Data projects require organizations to spend 46 percent of their time preparing data and 52 percent of their time checking data quality and consistence. You may think that makes sense — after all, Big Data can require a lot of integration work. Yet those numbers closely reflect what happens in other analytics projects across the enterprise. Business analysts, finance organizations, customer-facing departments (such as sales and marketing), as well as HR, manufacturing and supply chains all spend the majority of their time preparing and checking the data, Smith writes.
“Our information optimization research shows that most analysts spend the majority of their time not in actual analysis but in readying the data for analysis,” Smith writes. “More than 45 percent of their time goes to preparing data for analysis or reviewing the quality and consistency of data.”
Where Davenport saw a need for new tools to address the problem, Smith says the tools exist and aren’t being used. Part of the problem is that organizations are using existing BI and analytics products, which he says are not flexible enough. Though there are more than a dozen dedicated data preparation tools that can help, about one-third of organizations still handle these problems manually.
“Many of today’s analytics and business intelligence products do not provide enough flexibility, and data management tools for data integration are too complicated for analysts who need to interact ad hoc with data,” Smith writes. “Even worse, many organizations use spreadsheets because they are familiar and easy to work with.”
Alas, he doesn’t mention specific solutions, although those may be in one of the benchmark reports that he does mention, such as the Information Optimization Benchmark.
Smith does list a number of key capabilities for any analytics data preparation tool:
- Analyst-friendly—More than half of participants in a Ventana benchmark survey said that was a top requirement.
- Support for large numbers and different types of sources—Ninety-two percent of respondents said they had 16 to 20 data sources, with 80 percent saying they had more than 20.
- Access and integrate data wherever it resides, including the cloud or web.
- Color for highlighting data patterns.
- Reconciles duplication and incorrect data.
- Support for collaboration.
- Support for ad hoc and other agile approaches to working with data that maps to how the business operates.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.