With Big Data, what we’re seeing is a separation of two functions historically handled in the database, explains Datameer CEO Stefan Groschupf in a recent video.
“I think the really big observation is that historically you had this thing called database, where you interacted with the data and also you analyzed the data,” Groschupf says. “That’s splitting into two different systems that are more special purpose or highly optimized.”
Specifically, those two systems are:
- The read/write function of data storage, which in the evolving contemporary data infrastructure adds NoSQL solutions such as MongoDB to the traditional data warehouse.
- Analytics and data processing. While technically Hadoop is a file storage system, the real excitement is using it for analyzing the data via MapReduce or Spark or whatever else comes along.
What’s driving analytics out of the traditional database is that you have more analytical capabilities and power at a lower cost with Big Data systems, he adds.
But there are a few problems with Big Data technologies. First, as many experts have pointed out, moving data out of operational stores can result in losing data’s heritage and valuable metadata, both of which matter for compliance and accuracy.
Second, there’s the matter of integration and the fact that data can be in different formats. Both problems become even more significant when you’re dealing with large batches of streaming data, a la the Internet of Things. Groschupf points out that that’s why we should be talking about “data streams” rather than “data lakes.”
Groschupf and others contend that Lamba Architecture can solve these and other long-standing problems by using data in a flexible, but reliable, way. In fact, Groschupf’s short video presentation is actually an answer to an off-camera question about Lambda, but to really understand what it does, I recommend this blog post by Savi Technology’s Chief Architect Jim Haughwout.
Savi specializes in devices and sensor data — Internet of Things technology, basically — and has worked for decades with the U.S. military to deploy device technologies. The company has a long history with streaming data and other Big Data problems. So, it’s no small thing when the CTO says Lambda Architecture is as big as the browser, open source distribution and app stores.
“In the past 25 years I have seen four things that really made me step back and say, ‘This changes everything,’” writes Haughwout. “The most recent was the Lambda Architecture. Yep, it is that big.”
What makes Lambda Architecture such a big deal? Haughwout explains by way of a comparison. If traditional database architectures are fast food menus, requiring a lot of time, marketing and effort to change, then Lamba is like the pantry of a great chef. “You have all these ingredients,” he writes. “There are so many more options.” And you can make them much, much sooner.
Haughwout says Lambda solves these long-standing challenges:
- Preserving data in its original form and never changed or destroyed.
- Keeping data raw, rather than converting it into an arbitrary format or schema. Then, if you decide you need a component of the data later, it’s still there. Hooray for the data hoarders!
- Data is “engineered to allow it to be as easily reinterpreted as you learn.” Why does this matter? It makes it reinterpreting fast and fault-tolerant, he explains.
- Supporting real time with two points of view: Just in time and a deep cross-sectional view. “This lets you make decisions quickly without sacrificing the 100 percent loss-less accuracy needed for important business areas (such as finance, medicine, or mission-critical operations),” he adds.
You can see why he says it’s a big deal, but here’s what you should also remember: It’s an architecture, meaning a smart way to combine Big Data technologies. That means you can’t buy it from a vendor or build it in a weekend. It also means you can build different variations on the theme. For instance, here is how James Kinley envisioned Lambda Architecture using HBase and Impala as the technologies.
Groschupf gives us another example of how this might work. NoSQL becomes your system of record, while Hadoop powers your analytics.
“It’s totally logical that you pull data from the operational data store, you put it in Hadoop and you do analytics and then push into a serving system that’s in-memory that runs your web site or something,” he explains.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.