To make it simpler for IT organizations to construct modern data warehouses with both Hadoop and a massively parallel database, at the EMC World 2015 conference today, Pivotal unveiled a faster implementation of the Pivotal Greenplum Database and an implementation of the Apache Spark framework that runs on top of the Pivotal distribution of Hadoop. The updates are a part of the Pivotal Big Data Suite.
Sai Devulapalli, product marketing manager for data analytics at Pivotal, says that the EMC subsidiary is making an effort to simplify the deployment of data warehouses based on Big Data technologies that are easier to integrate with one another.
Pivotal also unveiled the Pivotal Query Optimizer, a cost-based query optimizer for both the Pivotal Greenplum Database and HAWQ, the SQL engine that Pivotal created to run on top of Hadoop.
With the upgrade of the Pivotal Big Data Suite, the company is also delivering the first version of the Pivotal HD distribution of Hadoop based on the Open Data Platform, which Pivotal previously pledged to support along with Hortonworks.
Speculation about the future of the data warehouse continues to run rampant. Devulapalli says that while the adoption of massively parallel processing (MPP) databases has been limited, the amount of data that IT organizations are starting to collect will drive demand for database platforms that are capable of processing that data in real time. A lot of data will also be processed on Apache Spark running on Hadoop, but Devulapalli says Apache Spark is not as mature or robust a platform as the Pivotal Greenplum Database.
It will still take a while for IT organizations to figure out how technologies such as MPP databases and Hadoop fit within a production data warehouse environment. But it is obvious that the days of data warehouses built solely on top of traditional relational databases are coming to a close.