Data Integration Remains a Major IT Headache
Study shows that data integration is still costly and requires a lot of manual coding.
The federal government's plan to invest $200 million in Big Data isn't just a major financial commitment to help find answers; it's a bet on a new and better future, where:
"The administration is banking on big data techniques having revolutionary effects on par with the Internet, which federal dollars financed decades ago," according to GigaOm.
Right away, the announcement triggered questions about how the government would staff all of these Big Data initiatives. It's a well-known fact that data scientists with Big Data experience (and largely, that's still translating into "Hadoop," "MapReduce" and "R") are in short supply.
Actually, some of that money will help address that shortage. The National Science Foundation has promised to encourage research universities to "develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers" and give $2 million for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
That's great news for the Big Data industry. But to me, it begs a question: What about integration?
I mean, we're talking the government here. It's siloed by nature, both in terms of organization, culture and, often, politics. Ask anyone who's worked in integration, and they'll tell you the biggest barrier is organizational rice bowling.
And we're also talking about government data, which means major legacy systems. Some of this data may be sitting in systems that have been hand-coded by Noah himself (think about that Big Data challenge for a minute!).
Randall Jackson, vice president for MarkLogic, Public Sector, sent an email that gave me reason to believe I'm not the only one concerned about the integration part of these projects.
"The federal government faces significant challenges when it comes to effectively extracting and leveraging big data, especially in real-time. This is mostly due to the underlying technology that is traditionally used," Jackson stated. "Much of the data is in silos' that do not quickly or easily interact with each other."
Jackson pointed out there are other challenges with the government's data: There's a never-ending rush of new information, plus it's distributed across the nation and the world, which means it's challenging to get an accurate "snapshot" of the data.
To give you some idea of what the government's facing in terms of information management, consider this example shared by Kaigham Gabriel, the acting director for DARPA.
The DoD's Big Data challenge is comparable to trying to find a single object in the Atlantic Ocean's nearly 100 billion gallons of water (roughly 350 million cubic kilometers), according to Gabriel:
If each gallon of water represented a byte or character, the Atlantic Ocean would be able to store, just barely, all the data generated by the world in 2010. Looking for a specific message or page in a document would be the equivalent of searching the Atlantic Ocean for a single 55-gallon drum barrel.
But how big of a challenge will integration be? I'll share what I've learned in my next post.