Information Week recently published an excellent and, from my point of view, entertaining Q&A on Big Data with IBM’s Information Management General Manager Bob Picciano.
For me, the interview confirms two trends I’ve noticed and written about: Hadoop is significantly changing ETL, and how vendors are making it hard to separate data’s use from the hardware it’s running on.
Evidence to the last point is that Picciano spends a good part of the piece explaining the differences in IBM’s various “Big Data” offerings: The Netezza data warehouse appliance, InfoSphere BigInsights (its Hadoop distribution), and BLU Acceleration for DB2.
Information Week flat-out remarks that BLU for DB2 “seemed like a throwback to 2009-2010, when IBM acquired Netezza.” Picciano says it’s still a “red-hot” problem for companies to handle data marts, and emphasizes BLU’s in-memory processing ability.
He also calls out Cassandra as “not highly deployed,” MongoDB as winding up a “low-end, very low-margin market for entry-tier SQL light,” and SAP’s HANA approach to in-memory applications:
“In the comparisons that we’ve run it has been an in-memory-to-in-memory comparison because that’s their environment. But remember that when Hana runs out of memory, it’s useless. That’s a big bet for your company when you’re, maybe, trying to go through a year-end close or the quarterly close and you find out that Hana was misconfigured.”
Ouch.
But, this being an integration blog, I’m particularly interested in his assertion that Hadoop will “create disruption” – his words, not mine – in ETL (extract, transform, load) technology.
As a “poster child” example, he pointed to General Motors, where CIO Randy Mott is using Teradata enterprise data warehouse and a new generation of extract-load-transform capabilities that rely on Hadoop as the transformation engine.
“IBM BigInsights is the Hadoop engine and we’re taking our DataStage [data transformation] patterns into Hadoop,” he added.
Which brings us to another big point in the interview: streaming analytics. Picciano sees this in-motion approach to data as IBM’s standout capability.
It’s no longer enough to know what question to ask, he says: Sometimes, it matters when you ask a question.
“In a big data world, sometimes the best thing to do is persist your question and have the data run through that question continuously rather than finding a better place to persist the data,” he said. “We think there’s real value for our clients around Hadoop and data in motion.”
There’s a lot to this article besides touting IBM, however, including real-world uses of real-time analytics and the five main use cases for Big Data. Check it out.