As the management of Big Data begins to increasingly move from the theory into actual practice, IT organizations are discovering that moving all that data around is no simple task. While the cost of storing massive amounts of information has fallen thanks to technologies such as the open source Apache Hadoop data management framework, moving large amounts of data around the enterprise is as complex an undertaking as it has ever been.
To address that specific issue, the folks at Pervasive Software, a provider of data integration tools and cloud computing services, came up with DataRush, which optimizes the processing of Hadoop data across every core available on multicore processor servers.
According to Pervasive Software CTO Mike Hoskins, one of the reasons that Hadoop clusters tend to be large is that as a technology Hadoop was never really designed to optimally take advantage of multicore processors. DataRush essentially adds a layer of software that manages the processing of Hadoop data in parallel across all the cores in the system, which means the Hadoop environment will scale linearly.
Hoskins says that also means Hadoop clusters can be much smaller than they are today, which means they can be more easily managed by IT teams that don't tend to have a lot of experience running large clusters of servers.
Hadoop, says Hoskins, is much more than a framework for processing large amounts of data. It's evolving into an entirely new computing architecture for building enterprise-class applications. But that won't happen, says Hoskins, until IT organizations can leverage multicore processors to process that data. Otherwise, IT organizations will have to rely on massively parallel database appliances that wind up adding additional cost and complexity to the IT environment.
It's unknown to what extent Hadoop will transform IT, but it's clear that significant changes are starting to occur that go way beyond the amount of data that can be processed. Hoskins says that will become a lot more apparent once a more accessible form of the MapReduce application programming interface becomes available later this year. Once that happens, Hoskins says the use cases for Hadoop are going to rapidly expand, which in turn will drive a much greater need to leverage the largely untapped parallel processing capabilities of multicore processors.