While there is a lot of interest in the Apache Hadoop framework because of its promise to cost effectively work with massive amounts of data, there’s also a lot of frustration with the overall performance of Hadoop on industry standard servers.
To address this issue, the folks at DataStax have come up with a way to run Hadoop on top of the open source Cassandra database. According to Ben Werther, DataStax vice president of products, the company essentially makes Cassandra appear as the Hadoop File System to an application. This not only allows Hadoop applications to more easily scale, notes Werther, it also addresses Hadoop performance issues without having to rely on any specialized hardware.
According to Werther, the performance limitations associated with Hadoop are derived from the batch-oriented nature of the Hadoop File System. By inserting Cassandra, he said that issue is resolved without having to fundamentally alter Hadoop.
As an open source database technology, the Cassandra database is an emerging technology that has a strong following among IT organizations that have been exposed to it. Werther readily concedes that there is not a huge installed base of applications in place that support Cassandra. But with the popularity of Hadoop rising, DataStax is betting that the need to scale these applications is going to create a lot more interest in Cassandra.
While everyone is excited about the ability to cost effectively analyze lots of data using Hadoop, nobody really wants to go to the trouble of porting the subset of that information in order to apply high-speed analytics. IT organizations would rather keep the processing and analytics all on one platform. And given the fact that they are already working with one emerging technology, sometimes it’s better to find the right platform for running those applications than it is to force fit a legacy architecture simply because that’s all the IT organization knows how to use at the moment.