SHARE
Facebook X Pinterest WhatsApp

How Hadoop Is Being Used for Business Operations Today

There is little doubt that Hadoop adoption is growing, and not just among enterprise-sized organizations, but by small- and medium-sized businesses as well. In an effort to understand this maturing market more deeply, Pepperdata conducted a survey about how and why Hadoop is used for business operations. The 134 survey respondents came from a range […]

Written By
thumbnail
ITBE Staff
ITBE Staff
Jun 16, 2016

There is little doubt that Hadoop adoption is growing, and not just among enterprise-sized organizations, but by small- and medium-sized businesses as well. In an effort to understand this maturing market more deeply, Pepperdata conducted a survey about how and why Hadoop is used for business operations.

The 134 survey respondents came from a range of experience, but all work at companies currently running Hadoop in production. The majority of respondents were from software engineering/development, data scientist, or data architect job titles (25 percent, 17 percent, and 12 percent, respectively). Almost half (40 percent) were from the information technology industry, with education and financial services (11 percent and 10 percent) coming in second and third. Over 45 percent have been in production for two years or more, with 15 percent of those being “advanced users” (four years or more in production).

In this slideshow, Pepperdata shares findings from the survey, such as key use cases, the size of Hadoop environments, and biggest challenges to production deployment.

How Hadoop Is Being Used for Business Operations Today - slide 1

Hadoop Use in BizOps

Click through for findings from a survey, conducted by Pepperdata, on how and why Hadoop is used for business operations.

How Hadoop Is Being Used for Business Operations Today - slide 2

Size Doesn’t Matter

The size of an organization does not always correlate to cluster size. Some of the largest Hadoop deployments and users tend to be small shops, such as ad tech companies, digital marketing, and analytics departments that don’t always have the highest numbers of employees. Most organizations just starting out with Hadoop have one cluster for production and one for test/dev. For those who have not figured out how to reliably run multi-tenant environments, it is common to isolate clusters.

How Hadoop Is Being Used for Business Operations Today - slide 3

Types of Workloads Do Matter

There is an interesting correlation between the types of workloads and the size of Hadoop clusters. Respondents who cited “streaming / real-time” as one of their workloads tended to have more clusters in production (46 percent had four or more clusters). Among respondents who did not have streaming or real-time workloads, only 20 percent had four or more clusters. The move to real time is adding cost and complexity to Hadoop deployments, through the use of cluster isolation as a best practice to guarantee performance. In order to successfully run Hadoop in production, organizations need to start moving away from cluster isolation and toward Quality of Service for Hadoop so they can run real-time/streaming applications (e.g., Spark) alongside batch workloads (e.g., MapReduce) on a single cluster.

How Hadoop Is Being Used for Business Operations Today - slide 4

Mixed Workloads Bring Cluster Chaos

In terms of the workloads that organizations are running, MapReduce leads the pack with an overwhelming 70 percent of respondents currently running MapReduce in production. Spark and Hive are close on the heels with 65 percent and 57 percent, respectively. Respondents also run HBase, batch workloads, Pig, streaming/real-time workloads, Impala and Flume in smaller increments. Given the breakdown, it is clear that many organizations are running mixed workloads in production and increasing the risk of experiencing cluster chaos.

How Hadoop Is Being Used for Business Operations Today - slide 5

Hadoop Is Still Hard

Respondents face a number of challenges when working with Hadoop. The biggest challenge reported was a lack of expertise or a skills gap. Too much time spent troubleshooting, resource contention, and lack of visibility all came up as common problems as well. This list confirms that with all the progress we have made over the past decade, Hadoop is still challenging, especially in production environments.

How Hadoop Is Being Used for Business Operations Today - slide 6

It’s Going to Get Harder

The Hadoop ecosystem is not only growing but maturing at a very rapid pace. New processing engines running on Hadoop are driving new, real-time, production use cases that bring their own set of performance challenges that need to be managed to realize true operational value from Hadoop. In order to combat this, organizations using Hadoop and other tools in the ecosystem need to find solutions that help jobs complete on time, facilitate higher utilization of existing hardware resources, and guarantee Quality of Service for Hadoop.

Recommended for you...

How Revolutionary Are Meta’s AI Efforts?
Kashyap Vyas
Aug 8, 2022
Data Lake Strategy Options: From Self-Service to Full-Service
Chad Kime
Aug 8, 2022
What’s New With Google Vertex AI?
Kashyap Vyas
Jul 26, 2022
Data Lake vs. Data Warehouse: What’s the Difference?
Aminu Abdullahi
Jul 25, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.