Hadoop is significantly disrupting the cost structure of processing data at scale. However, deploying Hadoop is not free, and significant costs can add up. Vladimir Boroditsky, a director of software engineering at Google’s Motorola Mobility Holdings Inc., recognized in a Wall Street Journal article that “there is a very substantial cost to free software,” noting that Hadoop comes with additional costs of hiring in-house expertise and consultants. In all, the primary costs to consider for a complete enterprise data integration solution powered with Hadoop include: software, technical support, skills, hardware and time-to-value.
The first three factors – software, support and skills – should be considered together. While the Hadoop software itself is open source and free, typically it’s desirable to purchase a support subscription with an enterprise service-level agreement (SLA). Likewise, it’s important to consider the software and subscription costs as a whole when considering the data integration tool to work in tandem with Hadoop. In terms of skills, the Wall Street Journal cites that a Hadoop programmer, also sometimes referred to as a data scientist, can easily command at least $300,000 per year. Although the data integration tool may add costs on the software and support side, using the right tool can reduce overall costs of development and maintenance by dramatically reducing time to build and manage Hadoop jobs. Finally, data integration tool skills are much more broadly available and much less expensive than the specialized Hadoop MapReduce developer skills.
The emergence of Hadoop as the de facto Big Data operating system has brought on a flurry of beliefs and expectations that are sometimes simply untrue. Organizations embarking on their Hadoop journey face multiple pitfalls that, if not proactively addressed, will lead to wasted time, runaway expenditures and performance bottlenecks. By proactively anticipating these issues and utilizing smarter tools, the full potential of Hadoop may be realized. Syncsort has identified five pitfalls that should be avoided with Hadoop.