IBM’s Take on Big Data and How Hadoop Is Changing Integration

Loraine Lawson
Slide Show

Five Pitfalls to Avoid with Hadoop

Information Week recently published an excellent and, from my point of view, entertaining Q&A on Big Data with IBM’s Information Management General Manager Bob Picciano.

For me, the interview confirms two trends I’ve noticed and written about: Hadoop is significantly changing ETL, and how vendors are making it hard to separate data’s use from the hardware it’s running on.

Evidence to the last point is that Picciano spends a good part of the piece explaining the differences in IBM’s various “Big Data” offerings: The Netezza data warehouse appliance, InfoSphere BigInsights (its Hadoop distribution), and BLU Acceleration for DB2.

Information Week flat-out remarks that BLU for DB2 “seemed like a throwback to 2009-2010, when IBM acquired Netezza.” Picciano says it’s still a “red-hot” problem for companies to handle data marts, and emphasizes BLU’s in-memory processing ability.

He also calls out Cassandra as “not highly deployed,” MongoDB as winding up a “low-end, very low-margin market for entry-tier SQL light,” and SAP’s HANA approach to in-memory applications:

“In the comparisons that we've run it has been an in-memory-to-in-memory comparison because that's their environment. But remember that when Hana runs out of memory, it's useless. That's a big bet for your company when you're, maybe, trying to go through a year-end close or the quarterly close and you find out that Hana was misconfigured.”


But, this being an integration blog, I’m particularly interested in his assertion that Hadoop will “create disruption” - his words, not mine - in ETL (extract, transform, load) technology.

As a “poster child” example, he pointed to General Motors, where CIO Randy Mott is using Teradata enterprise data warehouse and a new generation of extract-load-transform capabilities that rely on Hadoop as the transformation engine.

“IBM BigInsights is the Hadoop engine and we're taking our DataStage [data transformation] patterns into Hadoop,” he added.

Which brings us to another big point in the interview: streaming analytics. Picciano sees this in-motion approach to data as IBM’s standout capability.

It’s no longer enough to know what question to ask, he says: Sometimes, it matters when you ask a question.

“In a big data world, sometimes the best thing to do is persist your question and have the data run through that question continuously rather than finding a better place to persist the data,” he said. “We think there's real value for our clients around Hadoop and data in motion.”

There’s a lot to this article besides touting IBM, however, including real-world uses of real-time analytics and the five main use cases for Big Data. Check it out.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Jul 16, 2013 3:02 PM LoriV LoriV  says:
With all due respect to Mr. Picciano, I find the statement that SAP HANA is useless if it runs out of memory to be quite comical. Seriously, I can’t think of any disk based systems that would be very useful if they run out of disk space…. Regardless, one doesn’t really need to worry about this with SAP HANA because if an SAP HANA system should run scarce on memory, columns (selected by LRU mechanisms) are unloaded from memory down to Data Volume (HANA organized disks), in a manner that leverages database know-how, thus preventing the usual brutal SWAP activities of the OS. Of course, SAP offers scale-out capabilities with the SAP HANA platform so that customers can grow their deployments to multiple nodes, supporting multi-terabyte data sets. Here is a description of how HANA utilizes memory (http://wp.me/p1a7GL-lo). IBM is one of SAP's best partners and is very active in the HANA community. SAP HANA sizing is done by our hardware partners. I’m confident IBM experts know how to properly size an SAP HANA system and will ensure their client’s systems are sized appropriately. Reply
Jul 17, 2013 1:35 AM Loraine Lawson Loraine Lawson  says: in response to LoriV
Thank you for that response. Well explained! Reply
Aug 29, 2013 8:19 AM arindam at venturehire arindam at venturehire  says:
Thanks for the this post and the information. Reply
Apr 23, 2015 1:02 PM Ratnesh Ratnesh  says:
Great article! Basically, what SAP is doing is allowing you to use HANA as the hub for all your data. You can then use SAP Data Services to ETL your Hadoop data into HANA. Even more powerful, is the fact you can leave your data in Hadoop and use HANA to front it all and so have the full power of the SAP BI Suite against all your data whether it is in HANA or in Hadoop. More at www.youtube.com/watch?v=1jMR4cHBwZE Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.