David Jonker, director of Big Data strategy at SAP, explains to IT Business Edge’s Loraine Lawson how in-memory solutions like HANA will reduce IT complexity and the need for data integration. He also explains how in-memory tools will allow businesses to run more complex and historical data analysis within enterprise applications such as ERP, CRM and niche applications like demand signal management.
Lawson: I know about HANA, but I wondered if there are other areas where SAP has a Big Data strategy?
Jonker: Our strategy has multiple aspects to it or multiple solutions. The one half is very much about platform for Big Data, which includes our suite of analytic solutions, which covers everything from BI enabling for Big Data to advanced analytic tools, as well as a real-time data management strategy and platform. That’s everything from HANA with its center analyzing massive amounts of data in real-time to archiving capabilities that could archive many petabytes of data with SAP Sybase IQ product with Hadoop integration.
So that’s the one side is the platform, which incorporates all those pieces. Then the other side is very much an application strategy, because we believe that the true value of Big Data will be realized when you build in those capabilities into your enterprise applications, i.e. ERP.
Lawson: Can you talk a little bit more about what you're doing on the application strategy and where you are now?
Jonker: Today, the market seems very focused on this offline analysis of data in Hadoop. Hadoop seems to be the thing that everyone thinks is the be-all-and-end-all for everything Big Data.
We do not believe that at SAP. We believe Hadoop has a place, but a Hadoop-centered strategy is actually a shortsighted strategy for Big Data.
Two, the customers that are successful in deploying Big Data solutions tend to be the ones that are very focused on a particular problem out of the gate or looking for very particular signals and are building their first foray into Big Data around that. The customers that struggle are the ones that start with, “Hey, we should be doing something with all of this data. Why don’t we start by just collecting all of it?”
That’s problematic. What we see is that while you need a single platform that simplifies your IT environment — because we think that’s fundamental to enabling this — there will be many applications built on top of that; and applications that solve very specific problems. So that is our strategy for Big Data at SAP.
Lawson: Built on top of Hadoop specifically?
Jonker: No, our strategy is centered on SAP HANA, because at the end of the day, Big Data is about real-time access to data. While we are working to ensure Hadoop is part of the overall story, because Hadoop has a place to play in an overall strategy, it is not the center of a successful Big Data solution for Main Street. It seems to work well for Internet companies, but we believe for Main Street, for the large enterprises out there today, in-memory will be the centerpiece of their Big Data strategy.
I would argue that that’s what we’re seeing in the marketplace. Even Cloudera is coming out with in-memory capabilities, because they recognize that Hadoop itself is not enough.
An April TDWI report specifically states Hadoop is very batch-oriented and it has real limitations in terms of your ability to apply that within a traditional business. And if you're not overcoming those challenges, you're going to have some real problems with Hadoop. So that’s why HANA is at the center of it.
Let me also expand on that a little bit more. If you think about Big Data, often people hear the term Big Data and they instantly think, “Well, big means lots,” right? So it’s a volume problem and they center their thinking on volume. But it’s not actually a volume problem. Volume is a part of the problem, but it’s not the primary problem.
The primary problem is actually velocity. What a lot of people don’t realize is that most traditional relational databases have been able to store petabyte levels of data for many years. So storing large volumes of data hasn’t been the problem. The problem is that when you put too much data into a relational database, you can’t get it out fast enough. So volume creates a velocity problem, but the problem is velocity itself.
Variety does the same thing, right? Traditional relational databases have been able to store all kinds of varieties of data, images, video -- that kind of stuff. The problem is, you can’t analyze this stuff fast enough, right?
Hadoop, column databases, massively parallel processing and these grids of machine: They're all fundamentally trying to get around the bottleneck that is the disk. SAP says no, no -- you build the thing from the ground up for in-memory. And now we’re starting to see companies are adopting SAP’s strategy.