Terilyn Palanca, Pentaho Corporation’s director of Big Data Product Marketing, explains the company’s recent Big Data unveiling, The Adaptive Big Data Layer, to IT Business Edge’s Loraine Lawson.
Lawson: Is this your first Big Data announcement — I thought you already had Big Data connectors?
Palanca: We had connectors, but what we’ve done and the reason we’re now naming the layer so that conceptually people can understand what this is all about, is we’ve not only added to the connectors, but we’ve deepened them to some degree. We also want to convey that this entire layer of plug-ins is — although it’s “an abstraction layer,” — a native integration to all these data sources. It is not something like a JDBC or ODBC for SQL, where you're basically going after lowest common denominator and you're doing some translation, that is going to, in some fashion, impact your performance or your functionality or both.
There are still native integrations, but we felt it was important to the market to understand the range to which we’ve now delivered these. We’ve just refined our connectors into Hadoop, so they're even easier to access. They always were, there’s nothing really changed in there on that level, but we’ve deepened. We’ve added more versions of the distros. For instance, we’ve added the most recent distros from Cloudera, so we’ve now enabled (Cloudera) Impala access. Things like that. We just want to get the concept across that this whole series of plug-ins is extremely deep and broad. We thought the best way to do that was to put a name to this whole concept and call it the “Adaptive Big Data Layer.”
Lawson: Is it something that acts as a cohesive management layer and actual middleware layer? Or is it just sort of a collection of adapters?
Palanca: Well, it is a collection, however, our data integration is more than just ETL and connectors. We do scheduling, we do orchestration. We can even support Oozie orchestration here. So it’s more than just connectors in that it does give you all that capability around how the actual workload flows.
The other point to make is, again, with being native integration we are not limited to, for instance, HIVE in Hadoop, which is what a lot of traditional vendors are using, because they understand SQL. They're looking for a way to speak SQL or relational schema concepts directly into Hadoop. You can do that with us, but you can also access HDFS directly. You can access HBase. You can do whatever you need to do and we even allow you to create visual MapReduce jobs so that you never have to code a line of MapReduce if you don’t want to. You can do that through a GUI. We can even run our embedded engine directly on Hadoop. So there’s much more to us than just connectors.
Lawson: What are your use cases for this and the limitations?
Palanca: The use cases are broad. Again, we believe the world is moving in a direction of blended data in the future, that data is going to remain distributed and we’re not going to see Hadoop just suddenly displace everything that’s out there. It is yet another data store, a special purpose data store. So we feel that the real strength is making it an equal class citizen and making all the NoSQLs equal class citizens, so that whatever you're going to do your analytic workloads on, it can utilize. Your analytics can be placed on these kinds of data stores if you wanted to, or you can be just grabbing data from these and other data stores, mixing it together, joining it, integrating it together and using that as your picture for analytics and hosting that wherever you want it to be.
So basically, the way to think of Pentaho is we aren’t just data integration. We aren’t just BI. We are that complete end-to-end data management for analytics. So our real goal is to give you every capability necessary to really manage the flow of data throughout the analytic process.
That includes delivering the analytics as well as delivering the actual data management aspect of that. The reason we don’t use the word data management is people think of that as actual storage. That if you're speaking data management, you’ve got a data store. Well, we’re not a data store, but we get you to every data store possible and we make Hadoop and the NoSQLs an equal class citizen with everything else in that picture.
One of the use cases that we see emerging as a constant use case is this idea of the extended customer view now. The fact that what we used to think of as just blending our CRM system with anything new coming along in our transaction systems now has a bigger picture for our customer view. You have to know everything that’s going on in their social media. You have to know everything about anything in the external world that’s in unstructured data that’s affecting that customer. So we enable you to actually be able to see all of that, because we can get to all the data stores and put that picture together for you and then deliver that out to your analytics.