Combining operational data from other sources — particularly Big Data sets — is generating a lot of discussion as a “next step” for companies investing in Big Data. So it’s not surprising that Pentaho’s release of its new Business Analytics 5.0 platform is generating some buzz.
Pentaho calls this release a “complete redesign and overhaul of its data integration and analytics platform,” according to IDG. The reason for the overhaul: Pentaho wants to build a solution from the ground up that could address “data blending” and make it easier for the end user.
Which begs the question: What, exactly, is “data blending?”
Pentaho defines it as “blending” data from other sources to make your data sets more valuable and more insightful, the article explains. That sounds simple enough, but the problem lies in trying to actually do it.
The tradition way to handle this is through data integration using a relational database, but that’s not possible when you’re dealing with massive volumes of data or a time crunch, Pentaho’s Chief of Data Integration Matt Casters explained.
Blending data actually requires a pretty complex architecture, especially since traditional BI tools rely on SQL with its structured, relational format.
Big Data, on the other hand, is all about large, unstructured data. Datanami provides a great explanation of this if you’re interested, but the bottom line is, combining these two worlds is a challenge if not an outright nightmare.
As data warehouse and BI consultant Martin Rennhackkamp points out, there are two ways to work with Big Data in the BI ecosystem:
- Force structure on the Big Data as early as possible in the process by filtering the relevant parts. Great, let’s do that…except, sometimes you can’t, like when you need to store and process the data in Hadoop or a NoSQL database.
- Upgrade your BI environment to a more complex BI ecosystem that includes Hadoop or a NoSQL database.
It turns out this can involve several different tactics, such as bringing the ETL process to the data, (rather than bringing the data to the processing); dynamic data exploration–such as a separate data sandbox; and data blending, which, by the way, means you need to include a semantic layer.
Rennhackkamp defines data blending as an ETL replacement that “minimizes data movement and increases data availability between co-existing databases, with the data often times housed in different types of structures.”
On the plus side, data blending lets you handle the data integration, data quality, metadata management and data governance together.
Pentaho’s platform simplifies the work you need to do to combine the SQL world with the data integration world, according to Casters.
“At first glance, it seems that the worlds of data integration and SQL are not compatible,” Casters is quoted as saying. “However, SQL itself is a mini-ETL environment on its own as it selects, filters, counts and aggregates data.
“So we figured that it might be easiest if we would translate the SQL used by the various BI tools into Pentaho Data Integration transformations. This way, Pentaho Data Integration is doing what it does best, not directed by manually designated transformations but by SQL.”
Pentaho isn’t the first company to support data blending, although I believe it is the first to talk about data blending as a way to combine operational data with Big Data.
Tableau Software also uses the term data blending to describe how it accesses multiple sources of heterogeneous data, such as data on SQL server with data on Microsoft Excel, and combine it on a single worksheet.
BI Research founder Colin White wrote about Tableau’s data blending capabilities for Inside Analysis, comparing and contrasting it to data virtualization and data federation.
“Data virtualization supports an integrated data abstraction layer to the dispersed data, while data federation provides the technologies to efficiently access the dispersed data. Data blending provides fast, easy and interactive data access,” White wrote. “Of course these technologies are not mutually exclusive.”
Cutting through all the technology mumbo jumbo, what does data blending offer the business? It’s a way to overcome technology silos so you can gain more insight into your existing customers and improve their experience, according to Pentaho’s CEO, Quentin Gallivan.
“True ‘big picture’ insights happen when operational data sources are blended with big data sources,” Gallivan told CIO.com. “Companies that compete largely on service, in industries like telecommunications and financial services, see big data blending’s potential to help them gain market-share by providing the most personalized and interactive customer experience.”