Big Data and Big Infrastructure go hand-in-hand. Despite the dramatic advances in analytics, resource optimization and other tools to accommodate absurdly large volumes, the enterprise is still faced with the need to quickly ramp up resources to handle the onslaught.
The two ways to do that, of course, are to beef up internal infrastructure or pass it on to the cloud. At the moment at least, the cloud seems to be the preferred method, given the speed at which it can be deployed and the cost benefits it provides compared to traditional build-outs. But while the cloud may suffice at this stage of the Big Data game, will it hold up over the long term? Or should the enterprise steel itself for a time when even the cloud cannot cost-effectively support Big Data?
The launch of several new cloud-based Big Data platforms has raised hopes that the enterprise will stay ahead of the curve as volumes ramp up. Pivotal drew a number of oohs and aahs this month with its Pivotal One platform based on the open source Cloud Foundry PaaS architecture. The system enables multi-cloud clusters to enable on-demand Linux-based Apache Hadoop environments to be scaled up and down as data requirements dictate. As well, the platform provides a number of self-service analytics tools that handle data collection, storage, query and other functions.
Meanwhile, Virtustream has teamed up with Cognilytics to provide a cloud-based Big Data platform that combines Cognylitics’ analytics and business intelligence solutions and Virtustream’s xStream cloud management platform. The plan is to provide a wide range of deployment models targeting key users like enterprise, government and service providers, which can be enabled on an IaaS architecture for use on public, private and hybrid infrastructure. The key advantage to this approach is that it provides users with guaranteed CPU, memory, network and storage resources while at the same time eliminating restrictions on cloud configurations, virtual machine quantities and usage parameters.
Nonetheless, it seems like even top-end cloud solutions are proving no match for Big Data. According to Gartner, the cloud has already passed the “peak of inflated expectations” and is heading toward the “trough of disillusionment” in the company’s annual Hype Cycle report. It turns out that while the cloud’s cost-benefit ratio is rather high for normal enterprise workloads, it becomes less impressive as volumes increase. In the end, it may turn out that the cloud may suffice for sudden upsurges in demand, but not the slow steady increase that is Big Data.
This may be why we are likely to see a steady increase in consolidation among cloud providers, says Terremark’s Jim Anthony. The most effective Big Data solutions require a few key elements, such as a single, integrated storage and processing center, a universal high-speed networking infrastructure and end-to-end quality of service programs. Once Big Data forces the enterprise to turn to increasing numbers of disparate cloud providers, cohesiveness in the overall environment will suffer, resulting in higher costs and lower productivity.
The thing to remember in all of this, though, is that the enterprise has a choice as to exactly how deep into Big Data they will venture. While the vendor community is rife with dire warnings that the difference between success and failure may lie with a minor scrap of unstructured data deep in the bowels of an email system somewhere, the fact is that retention and analysis of data can only extend to the limit allowed by available infrastructure and the budgets that provide it.
The cloud gives organizations a lot of wiggle room in this regard, but it does not and will not provide unlimited scalability with which to parse every last ingot of information. The real challenge going forward, then, is not to capture and control all data, but to intelligently ascertain what is valuable and what is not.