Recognizing that Hadoop and SQL database technology need to be joined at the hip in the enterprise, EMC Greenplum today announced Pivotal HD, an implementation of the company’s massively parallel database that is now integrated with the Hadoop Distributed File System (HDFS).
According to Josh Klahr, vice president of product management for EMC Greenplum, the benefit of this approach is that it allows organizations with massive investments in SQL to start using low-cost Hadoop implementations as a data warehouse.
Rather than having to learn an arcane MapReduce interface, Klahr says that Pivotal HD is designed to allow IT organizations to run high-performance Hadoop applications using a SQL syntax that is already commonly known throughout the enterprise, versus requiring them to invest in a data scientist that is fluent in MapReduce.
Klahr says Pivotal HD delivers query response time improvements that range from 10 times to 600 times faster than current SQL options for Hadoop.
EMC Greenplum is not the only company trying to tightly couple SQL to Hadoop these days. As this trend continues to evolve, it’s becoming clear that Hadoop will soon be replacing relational databases across a swath of data warehousing applications. What’s not clear is what role SQL will play exactly. There are those that argue that Hadoop, by its very nature, eliminates the need for ad-hoc SQL queries. Instead, the algorithms in the Hadoop application will discover patterns and anomalies. The schema will then be generated by Hadoop as part of the read operation, as opposed to traditional SQL data warehouse applications that generate schemas as part of the write operation.
Naturally, it will take some time for this transformation of data warehousing to play out, so SQL will remain relevant for some time to come. But the one thing that is for certain is that Hadoop in the enterprise will forever change the way IT organizations think about building data warehousing applications.