In a report released in January, Capgemini reported that, while 60 percent of executives believed Big Data will disrupt their industry within the next three years, only 13 percent of organizations in the survey have achieved full-scale production of their Big Data implementations. Moreover, the research found that only 27 percent of respondents described their Big Data initiatives as successful.
This is not a surprise to Ashish Thusoo, CEO and founder of the Big Data-as-a-service company Qubole. As one of the original creators of Facebook’s data infrastructure and authors of Apache Hive, he’s intimately familiar with the challenges of working with Hadoop, Spark, Hive, Presto and other related technologies. Building out an on-premises Big Data infrastructure takes significant time, investment and technical expertise and brings with it a lot of risk to boot. That’s why Thusoo believes Big Data belongs in the cloud.
For organizations currently considering their own Big Data deployments, here are six things Big Data software providers won’t tell you about on-premises solutions.
Six On-Premise Big Data Hurdles
Click through for six hurdles Big Data providers would rather you not know about, as identified by Ashish Thusoo, CEO and founder, Qubole.
It May Take Years to Realize Value
Building and maintaining Big Data infrastructure is hard work. It takes specialized expertise, investments in hardware and software, and a team of consultants to deploy – not to mention trainings on how to use it. And only once the system is installed and teams are trained can organizations start to dig into their data. It’s not uncommon for organizations to only start seeing a return on investment as much as 24 months after the project was initiated.
You’ll Need to Hire Experts
Once your infrastructure is in place, you’ll need a team of administrators with expertise in Big Data technologies to maintain it. That involves trouble-shooting, fixing and replacing hardware, identifying software bugs, un-sticking jobs that get locked up and managing clusters, among other tasks. Organizations may need a team of five or more specialists to keep their Big Data systems running smoothly.
Scaling Up or Down Is Onerous
Organizations try to design their Big Data architectures with scalability in mind. Yet, they often underestimate how much storage and compute power they’ll actually need and how quickly their data will grow. Scaling requires adding more hardware and tweaking software. It can take weeks to do when you really want to increase capacity now. In contrast, an organization that over predicts its workload could be stuck with significant sunk costs for underutilized hardware and support.
Technology and Needs Change Rapidly
There is a wide range of Big Data technologies, and innovation in this sector is happening fast. There’s Hadoop and Hive and Spark and YARN and on and on — who knows what new, must-have technology will emerge next year. When you started your Big Data initiative, batch analytics may have been adequate, but you may discover later that you need more real-time capabilities. Moreover, once organizations start diving into their data, they may find different use cases that drive more value. With an on-premises Big Data system, new software technologies can require integration with existing systems, fine-tuning to get stable and working correctly, as well as new expertise to install and maintain it.
Data Is Meant to Be Accessible and Shared
The value of Big Data analytics is in the insights that are gained from it, and that value is multiplied as data and insights are made available to more people in an organization. A major drawback of on-premises solutions is that they require a fairly high level of expertise on the part of the analyst or data scientist to use. Big Data software can have limited collaboration tools, making it difficult for teams to work together on projects or share outcomes.
There’s Risk Involved
Big Data analytics requires some trial and error. Organizations don’t always know what they’re going to find within their data — the insights might lead to significant value or maybe not. Or, they may find their first area of focus of their Big Data initiative isn’t yielding as much value as hoped, and then look at other areas of the business and other data sources where they might derive more value. The risks can be high, especially given the time and investment that often goes into rolling out a Big Data program and the infrastructure to support it.