Having identified Big Data as a major strategic opportunity, IBM has every facet of the company working towards optimizing the company’s product portfolio for Big Data applications.
Today, IBM is rolling out software and hardware offerings that include a series of in-memory computing extensions to its IBM DB2 database that are intended to significantly accelerate Big Data application performance while at the same time unveiling a turnkey Hadoop system based on the company’s PureSystem platform.
The IBM PureData System combines an upgraded instance of Hadoop, known as IBM InfoSphere Big Insights, with the integrated server platform that IBM developed to converge the management of servers, storage and networking. The goal behind deploying the IBM PureData System is to eliminate the complexity associated with setting up a Hadoop cluster using a version of IBM Infosphere Big Insights that can now support SQL applications in addition to the MapReduce interface.
While making it easier to set up a Hadoop cluster, IBM is also improving the performance of DB2, especially in data warehouse environments.
Collectively known as BLU Acceleration, the new capabilities include a data-skipping feature that allows applications to skip over data that doesn’t need to be analyzed, such as duplicate information; the ability to analyze data in parallel across different processors; and greater ability to analyze data transparently to the application, without the need to develop a separate layer of data modeling.
BLU Acceleration provides what IBM is calling “actionable compression,” where data no longer has to be decompressed to be analyzed.
The end result of all those enhancements, claims IBM, is a DB2 system that is eight to 25 times faster when running reporting and analytics applications.
According to Nancy Kopp, director of Big Data product strategy at IBM, the ability to run certain functions in memory separates IBM DB2 from rival relational database systems that are bound by the limitations of hard disk drive performance.
As Big Data technologies evolve, Kopp says IBM is seeing three major impacts of Hadoop. Organizations are increasingly archiving data on inexpensive Hadoop clusters, they are using Hadoop to pre-process data before running through a SQL-based data warehouse application and they are setting up “data labs” that allow analysts to explore large amounts of data.
Collectively, these technologies are transforming the data warehouse from being primarily a SQL-based platform into a federated application that spans SQL databases and a range of NoSQL technologies. In addition, Kopp says other sources of data such as stream analytics or time-series data found in an IBM Informix database are being brought into play as well.
While it’s up to each organization to figure out what technologies they may want to deploy inside a data warehouse, the one thing that is for certain is that in a world where a SQL database is only one component of the data warehouse, the job of the typical database administrator will never be the same again.