At the Strata + Hadoop World 2015 conference this week, Cloudera announced that it has begun beta testing a columnar data store, dubbed Kudu, that will run natively on top of Hadoop, while providing access to a public beta of a roles-based access control mechanism for Hadoop called RecordService.
Mike Olson, chief strategy officer and chairman of the board for Cloudera, says that via the columnar data store, Cloudera wants to enable the deployment of faster types of data analytics on top of Hadoop. As such, Kudu is not a replacement for the Hadoop Distributed File System (HDFS) as much as it is another type of storage engine that a different class of applications can now natively invoke, says Olson.
In general, columnar data stores are usually used with analytics applications that are closely associated with data warehouses. By adding support for a columnar store on top of Hadoop itself, Cloudera is again signaling that Hadoop will over time usurp most of the databases that are currently used to support data warehouse applications.
Longer term, Olson says Cloudera envisions a world where real-time analytics created using Apache Spark as the programming environment for Hadoop will run in-memory along with data being generated by transaction processing systems. The end result should be a way to attach analytics to transactions in real time without having to manage the same level of complexity associated with processing transactions on relational databases that we have today.
While Olson says the industry as a whole is a long way from making this actually possible, it does represent an avenue of research that Cloudera plans to continue to investigate.
In the meantime, Cloudera continues to be committed to unifying as much of the analytics processing in the enterprise around Apache Spark, Kudu, its Impala SQL engine and Hadoop as possible.