SHARE
Facebook X Pinterest WhatsApp

Open Source Project Takes on Hadoop Storage Scalability Issues

Necessity is often the mother of invention, so when it comes to managing Big Data these days, it shouldn’t be surprising to discover that a lot of organizations are relying more on their own initiative. A perfect example of that initiative is a new open source project that makes available a new file system for […]

Written By
MV
Mike Vizard
Sep 27, 2012

Necessity is often the mother of invention, so when it comes to managing Big Data these days, it shouldn’t be surprising to discover that a lot of organizations are relying more on their own initiative.

A perfect example of that initiative is a new open source project that makes available a new file system for Hadoop. Quantcast, a provider of an analytics service for measuring Web traffic, created the Quantcast File System (QFS) to address storage scalability issues in Hadoop environments. According to Jim Kelly, vice president of research and development at Quantcast, the QFS is a more efficient approach to managing storage than the Hadoop Distributed File System (HDFS) that comes native with the Apache distribution of Hadoop. QFS is a derivative of the open source Kosmos File System (KFS), which is also known as CloudStore.

Kelly says the key difference between QFS and HDFS is that Quantcast rebuilt the sorter in Hadoop and added a more accessible application-programming interface. QFS runs more tasks in parallel and implements an error recovery mechanism based on a Reed-Solomon error correction algorithm that reduces storage costs by better reclaiming empty space of disk drives attached to a Hadoop cluster. According to Kelly, that has enabled Quantcast to reclaim as much as half the disk space in its Hadoop cluster, which not only reduces storage costs, but also reduces physical space requirements in the data center and the amount of energy consumed.

Quantcast developed QFS to deal with the more than 40TB of data the company is pumping into its Hadoop cluster every day. While most organizations are still piloting their Hadoop projects, Kelly says it’s only a matter of time before they encounter the same scalability limitations of HDFS that Quantcast did.

Given the fact that storage vendors don’t have a real financial motivation to come up with technologies that serve to reduce storage consumption, Kelly says Quantcast felt the time had come to build a larger community to support the continuing development of QFS.

The issue that this move by Quantcast gets at is whether organizations can really trust storage vendors to address one of the most vexing and costly challenges in all of IT. In theory, competition between storage vendors should result in more efficient systems that help rein in storage costs. What’s not apparent is whether that pace of innovation is being stifled by incremental, rather than major, improvements in storage technologies that wind up only addressing part of a storage management problem that is becoming critical, without really solving the whole problem.

In fact, it’s that slow pace of innovation across the industry that appears to be giving birth to a multitude of open source projects that are not only less expensive technologies to adopt, but increasingly are becoming the driving force for software innovation across the entire IT industry. In other words, there are increasing signs that out of sheer frustration, IT organizations are increasingly losing patience when it comes to waiting for vendors to solve their problems.

MV

Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

Recommended for you...

Observability: Why It’s a Red Hot Tech Term
Tom Taulli
Jul 19, 2022
Top GRC Platforms & Tools in 2022
Jira vs. ServiceNow: Features, Pricing, and Comparison
Surajdeep Singh
Jun 17, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.