For all the hype surrounding the NoSQL movement, SQL remains the lingua franca for queries across both relational and other types of emerging databases. In fact, one of the fastest growing use cases for SQL is on top of Hadoop clusters. This week, Hortonworks underscored that point at a Dataworks Summit/Hadoop Summit Munich conference via a release to the Hortonworks Data Platform (HDP) that adds support for an instance of Apache Hive 2.0 with Live Long and Process (LLP) capabilities that runs in memory.
Rather than having to invest in a commercial relational database, Hortonworks CTO Scott Gnau says HDP 2.6 provides all the advantages of SQL running on a platform that can handle several orders of magnitude more data. Because Hive 2.0 LLP runs in memory, any query against that data can now be processed in the sub-second range, says Gnau.
Armed with these capabilities, Gnau says, many IT organizations will increasingly be migrating analytics applications off relational databases in favor of directly accessing data stored in Hadoop.
“It’s now really only a matter of when,” says Gnau.
Other enhancements provided in HDP 2.6 include support for version 2.1 of the Spark in-memory computing framework and Zeppelin, an open source Apache project for creating interactive notebooks using analytics data. In addition, Hortonworks has made it simpler to configure and secure a Hadoop cluster.
As more organizations opt to employ Hadoop as the primary platform for storing data, it only makes sense to move as many of the engines for processing that data as close to Hadoop as possible. The only question now is determining how many of those engines are to run on the Hadoop platform itself versus continuing to allocate additional dedicated infrastructure.