“Data is the business” became a common refrain in 2014. The push to be data-driven gave way to data over-consumption, as organizations sought embraced analytics, cloud and Big Data.
Will this year’s data exuberance lead to a data hangover in 2015? Some of the predictions I’ve seen lead me to suspect that 2015 will be the year that organizations sober up.
Data services company BDNA predicts that as data becomes the “backbone of the global economy,” more organizations will demand and invest in clean data. “As big data fades into the background, ‘clean data’ will take its place at the top of the IT trend heap,” according to BDNA. “Inaccurate or corrupted – so-called ‘dirty’ – data has no value to its users or owners, and may as well not exist.”
Big Data will play a major role in driving this change, of course. We’ve had a good three or four years of exploring Big Data and are realizing that just because you can keep everything, that doesn’t mean you should — or that you’ll be able to manage and use it well if you do.
In an Australia Business Review column, Teradata predicts this will lead to redesigning and rebuilding some data ingestion and integration tasks. In particular, Teradata sees organizations investing in data integration optimization services, rather than data replication and in-memory computing.
“Organizations are also likely to gain a better understanding of the relative value of data, not just the cost and monetization,” the column notes.
Some industries have already seen both the costs and limitations of using “bad” data. For instance, insurers say data integration and data quality problems are major impediments to using predictive analytics in anti-fraud technology, according to Insurance & Technology.
“Addressing data quality and integration issues is critical to producing a successful model,” the article warns. “The quality of fraud analytics depends directly on the quality of the input data.”
Data integration and data cleansing are also top concerns for federal agencies, including the Department of Transportation, the Department of Agriculture and the Department of Homeland Security (DHS). Data officials from each shared their data quality challenges at a December industry forum covered by FedScoop.
Data silos and quality problems first became an issue for the DHS in the aftermath of the Boston Marathon bombing, according to Donna Roy, the executive director for the DHS’s Information Sharing Environment Office. The agency found that it had more than 40 systems with more than 900 datasets, each requiring separate logins for analysts.
That’s when the organization realized that it needed to separate its systems from its data, Roy said. Still, access was only about 20 percent of the problem. Roy said data quality problems were actually about 80 percent of the work. To fix both problems, the DHS focused on cleaning the data as it moved into a data lake.
Data quality isn’t quite as exciting as NoSQL databases or social media, but life can’t always be a Big Data party. At some point, it’s time to clean up.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.