Using In-Memory to Crunch Big Data in Record Time

Loraine Lawson

Fraud detection reduced from 45 minutes to four seconds. A 48-hour batch process, now performed in under six hours. Gary Nakamura, general manager of Terracotta, explains to IT Business Edge’s Loraine Lawson why the use cases for in-memory data processing speak for themselves.

Disclosure: Lawson also writes for B2B.com, which is a vendor-neutral site for B2B news owned by Software AG, which acquired Terracotta in May, 2011.

Lawson: Terracotta’s approach to handling Big Data is to use in-memory processing. Tell me a little about that.

Nakamura: Terracotta has been around for seven years and from the beginning we have been involved in building infrastructure for performance and scale for data.

Let me just fast forward a little bit to the end of 2010. We were obsessed with solving the challenge of moving large amounts of data in-memory, and helping people work with this data without having to constantly tune it keep the performance up.

We ended up with this very significant innovation, which was developing a way to store data on RAM. Literally, end users went from storing 50 gigabits in memory in the first six months, to a 9 terabyte deployment of data in-memory.

That was very significant for us and it was one of those situations where you had to pause and think about what had just happened. It was yet further proof that Big Data was no longer a marketing term and that the use cases were transforming businesses. Some of Terracotta’s customers were changing the way they were doing business using in-memory data management, such as moving from 45 minutes for fraud detection to less than four seconds across tens of terabytes of data. This was a highly significant breakthrough.

Next, we realized the confluence of where we were and why we were seeing this pattern with our customers’ deployments. There is just a massive amount of data that’s being collected right now and most of those customers or users are perplexed as to how to access and use this data that extracts its value. This can include driving higher performance applications or processes, or delivering better SLAs, or analytics.

We realized that we are in the middle of this confluence of cheap memory, where RAM prices are plummeting, machines are getting bigger and bigger, and at the same time there is an explosive growth of data. People are collecting data in such vast amounts every day that it is our belief that in five years “big” will become the norm. RAM, because of the requirements around performance and access to that data, will become the new disk.


Because of the explosive growth of data, a business problem has been created: How do you extract value from it? A second, larger problem is that the prevailing technologies cannot keep up with the growth requirements of Big Data to deliver acceptable performance.

The Big Data problem exists because databases cannot meet the demands and neither can most data warehouses. Consequently, organizations are trying to figure out a way to slice and dice the data so they can actually make use of what they are collecting. It’s an interesting time for data management.

Lawson: Terracotta is owned by Software AG. How does Terracotta fit in with Software AG’s business model?

Nakamura: The courtship started when they were looking for ways to transform their existing products. One of the things that they were running into is this whole idea of data being very, very large and their technologies are very data-centric. Software AG was actually looking to revamp its product line with an in-memory solution to solve the Big Data problems that its customers were facing, so they went out looking for an OEM partner.  

Once they found Terracotta, the vision that they had was that BigMemory would be the core of Software AG’s infrastructure. The company knew it could solve a lot of problems in their customer base if the data was in-memory, whether it’s analytics, real-time analytics or, complex event processing. A lot of these processes and applications are very easily solved if you are holding the data in memory in a normal format. This transfer from the REB world to the data warehouse world becomes less of an issue and is more about working with data in the memory space rather than actually siloed, disconnected operations. In a little over a year, Terracotta and BigMemory have become the cornerstone of Software AG’s data management strategy — they call it the Next-Gen Data Management Strategy.

Lawson: What about data quality issues — how do you deal with data quality issues when using in-memory? This question originally came on a blog post I’d written that talked about how in-memory is, in some cases, taking over from ETL batch processing, because you're moving the data lifting to in-memory. ETL tools now have integrated data quality tools, so a reader was questioning what happens to data quality if you’re replacing ETL with in-memory?

Nakamura: One of the problems with extracting data from one data silo to another is that some data may get lost in translation. Some important data or metadata that’s associated with the generic data that’s been lost in translation could potentially devalue it, effectively turning it into garbage.

In the in-memory world, if you can normalize all of this data without losing it in translation, it could mitigate the risk such that the data becomes invaluable when you actually move it to the other side.

The other thing that we’ve seen with batch and integration is that, in some cases, it obviates the need to take the data and ensure that it’s in different applications and silos. If you have a common store that’s in-memory that everybody can read at an application-level format, it might obviate the need for many integration use cases. This isn’t going to happen overnight because of the abundance of data software.

On the mass processing side, we have many customers that are utilizing our in-memory Big Data or our BigMemory solution to store the data that’s needed to do batch processing. This can execute their batch processes much, much faster. For example, Kaiser has a use case that took 48 hours to run as a batch that they now do in less than six hours using Terracotta’s BigMemory.  

Lawson: So how do you feel about this whole replace — that in-memory computing might replace batch integration? Would you not classify it as integration, or do you think that’s a misleading statement?

Nakamura: I don’t know that it’ll replace the notion of batch. We have customers that are utilizing in-memory for batch processing. They just want to do it at scale with high performance so they can hit their SLAs.

The problem with batch going forward is Big Data. Datasets are going to be much, much bigger. For example, the type of work that you're going to need to do against that data will be much more extensive because the end user has a mobile phone. It’s not just two-dimensional data anymore.

Lawson: Both in-memory and Hadoop address Big Data, but has your solution ever been used with a Hadoop system?

Nakamura: While we are solving many Big Data problems, we’re not the only solution.  We address very specific, and relevant problems of delivering data at high performance, at scale, across large data sets.

One of the big challenges of Hadoop is while it scales, it’s not very fast at extracting data out of it. Some of our end users are putting the result sets and the reporting from Hadoop into BigMemory and maintaining it there. We could have 100 terabytes Hadoop instances where we’re providing a one-terabyte window on the data, and the end user has very fast access to that large scale of data. It usually results in sets of calculations that are not in Hadoop. Once the analysis is completed, end users need to store the information somewhere. They can either put it in a database, which is too slow, or they can put it in something like BigMemory.

Lawson: So what are some of the other unusual use cases for in-memory or things we might not have heard about that you think are relevant or particularly promising?

Nakamura: We have a very large telecommunications customer in Australia that had three data silos, including in Seibel and a mainframe. One of its big challenges was normalizing all of this siloed data to creating a unified view of the customer.

One of their goals in this effort was to reduce their call center costs. They spend a million dollars on their call center a year and they wanted to move up to 40 percent of their call center traffic to the Web. They are pulling all of this customer data from the three different sources, storing it in- memory. This represents 10 terabytes a day of data or 100 terabytes at the end of 2013. They’ll now be able to rapidly analyze this data to cross and upsell new services.

Lawson: People do that now without in-memory.

Nakamura: They do, but it’s very, very slow and they can’t do it at scale. Instead they do it in snippets, which do not transform your business. With BigMemory, they are putting their entire customer base in-memory. What used to take 30 seconds now takes only 30 milliseconds.

Lawson: So you went from being a newspaper ad to an auctioneer there.

Nakamura: That’s precisely it. It just changes the game. Let’s think about 45 minutes for fraud detection. Whoever stole whatever they stole is at home, looking at their new toy drinking a beer. In four seconds, they're not even out of the store yet. 

We also have a number of government customers that have large scale data problems that they need addressed now. I can’t really give you details about them, but they're in the Department of Defense, Homeland Security, FBI and FAA.

What astounds me are our customers. Once you show them what they can do, the proof is in the pudding. They are so grateful because what once seemed impossible is now possible.



Add Comment      Leave a comment on this blog post

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.