The Business Impact of Big Data
Many business executives want more information than ever, even though they're already drowning in it.
As I mentioned yesterday
, the goals are lofty, and that's putting it mildly. But what about integration? Will integration be a major factor in this Big Data play?
When you talk about Big Data, most people think about Hadoop, so let's start there. Obviously, integrating with Hadoop has been a big topic in the past year, with most of the focus on how you pull data from Hadoop and into existing analytics systems. I've written about it numerous times on this blog as data vendors announced Hadoop connectors and other ways to solve the integration problem on that end.
But I wondered about the issue of going into Hadoop - is that a problem when you're dealing with legacy systems, I asked Forrester analyst James Kobielus
. No, not really, he responded.
"Think of Hadoop as in the broadest sense, just another type of database for analytics, just like a data warehouse or an OLAP cube on a different level," Kobielus explained. "So how do you get data into these environments? You use data loading tools - ETL and data replication tools. There's plenty of tools out there in the industry at large."
In fact, he added, the major data warehousing companies already support bi-lateral connectors to move data into and out of Hadoop clusters.
But the federal government isn't just talking about Hadoop, he added. It's talking about all Big Data options.
"I expect the feds are going to use big data platforms of various sorts to do advanced analytics - complex content, structured data coming from various relational databases and unstructured content coming from other file systems and content management systems and possibly even external social media, like filtering social media," he said. "What I'm getting at is the feds it's clear that they're just putting money into big data for various project but they're not hinging that on any specific approach for doing big data.
"The bottom line is this is not so much that they have a Hadoop initiative, but clearly a lot of that $200 million will be invested in solutions that probably incorporate a lot of Hadoop, because that's the latest and greatest new approach for doing petabytes of data in real time and in real time coming from multifarious sources, structured to unstructured."
Randall Jackson, vice president for operational database company MarkLogic
, Public Sector, pointed out federal agencies will need to choose carefully to find the right solution for the particular Big Data problem they want to address.
"Integration is certainly a challenge for organizations which need to bring together and use structured and unstructured data," Jackson said. "There is a portfolio of powerful software tools, including Hadoop, available to solve Big Data challenges. It's imperative to choose the right set of tools to mitigate the amount of integration needed."
OK, so Big Data integration, on a tactical level may not be a major challenge.
Then again, technology is seldom the biggest problem on the road to integration. What's really tricky is integration on a strategic level.
If politics is, as political scientist Harold Dwight Lasswell
contended, "who gets what, when, and how," then data integration is, at its core, political. Integrating data requires a tunnel through organizational silos, convincing people to share, and establishing data governance that allows integration while still maintaining data's integrity.
President Obama's announcement sends a strong top-down mandate that it's time to break down the walls of data.
"Today's event highlights how cooperation and collaboration among agencies, researchers, the private sector, universities and others can overcome these challenges to unlock the true power of big data for a more effective government," Jackson said. "I think this could signify a tremendous step forward for technology and within the public sector."
The question is, can it overcome the gravity of its own bureaucracy to tackle the integration challenges it faces? In the years I've been covering integration, I've seen the federal IT system make huge strives that suggest it can, as long as it doesn't forget the integration challenges that come with data, both big and small.