When dealing with massive amounts of information, one of the inherent challenges is getting that amount of data in and out of any given system. In particular, with the rise of the open source Hadoop framework for managing Big Data, IT organizations are finding that loading data into Hadoop is a time-consuming process.
Pervasive Software is taking a crack at solving that problem with the release this week of Pervasive Data Integrator v10 - Hadoop Edition, an extract, transform and extract (ETL) tool that uses the company's parallel-processing engine to accelerate the movement of data into and out of Hadoop.
According to Mike Hoskins, Pervasive CTO and general manager of Pervasive Big Data products and solutions, Pervasive Data Integrator is a high-level ETL tool that allows organizations to create a visual representation of any data set. Then, with the click of a button, they can transfer that data into Hadoop.
Hoskins says he's seeing a rapid migration of analytics applications to the Hadoop platform partially to save money, but also to move away from a SQL construct that was fundamentally designed to support transaction processing applications. In absence of a framework such as Hadoop, organizations extended SQL to support business intelligence and analytics applications. But from a licensing perspective, those SQL-based approaches are significantly more expensive compared to running an analytics application on Hadoop, says Hoskins.
The Pervasive Data Integrator, adds Hoskins, gives IT organizations a fast way to transfer data from existing analytics applications into Hadoop while eliminating the need to master cumbersome MapReduce interfaces in order to load data into Hadoop.
Technology advances related to Hadoop are coming fast and furious as of late. While there are still many issues to be resolved, it's apparent that at least a fair amount of the fundamental IT issues ranging from how to quickly provision a Hadoop cluster to how to load data are rapidly being addressed.