It might surprise you to learn that the vast majority of Big Data analytics takes place within on-premises infrastructure.
This makes the most logical sense, in fact, because despite what you hear about the rise of the cloud, most Big Data loads reside in the enterprise data center in the form of both structured and unstructured historical data. To lower costs, organizations are placing their analytics capabilities as close to that data as possible.
But this is likely to change relatively quickly.
According to Wikibon, spending on Big Data hit $27.3 billion last year and is expected to top $35 billion in 2015, which is impressive for a phenomenon that didn’t even have a formal name until about three years ago. The cloud, however, holds only about $1.3 billion of the market, dwarfed even by the “professional services” (read, consultants) category, which draws about $10.4 billion.
As the decade unfolds, however, more of the Big Data load is expected to come from sensors, devices and other components of the Internet of Things. These streams will likely be too large for the average data center and will therefore have to be diverted to scale-up/out cloud environments for processing and analytics.
Already, cloud providers and analytics companies are laying the groundwork for increased Big Data handling capabilities. In-memory analytics firm Databricks, for example, recently added a new job scheduling module to its Apache Spark-based platform. Spark is the same framework used by Amazon and others to simplify and automate the flow of workloads to the cloud, but in Databricks' case the workloads run on its Databricks Cloud service. Databricks, in fact, was instrumental in the creation of Spark, so it seems natural that the company would use it as a bridge to the cloud. But the fact that the company is now running Big Data loads on in-memory infrastructure exclusively on the cloud speaks volumes as to how workloads will be distributed in the near future.
For its part, Amazon is shoring up its own analytics capabilities rather than position itself merely as a provider of raw resources. AWS recently unveiled the Amazon Machine Learning service aimed at poring through large data files to glean useful nuggets of information. The move actually follows similar deployments on Microsoft’s Azure cloud, but Amazon has the advantage of the wealth of data already stored in the S3 service and processed by the Redshift warehouse and the Relational Database Service (RDS) platform, all three of which will be fully integrated into the Amazon Machine Learning system. And unlike a regular analytics platform, intelligent platforms like this are expected to learn from the data they process, offering up bits of intelligence that users may not even know they need.
Big Data, in fact, is likely to become the cloud’s killer app, says IBM’s James Kobielus, if for no other reason than it will be easier to lease the hyperscale infrastructure needed to handle the load than to build it. Applications like social networking, mobile data, collaboration and the like are already more at home on the cloud than in the data center, so it won’t be long before the data these applications generate starts to influence the actions and business models of the enterprise. And at that point, it will simply be easier to house Big Data analytics in the cloud because, again, that is where the data is.
This transition is about more than just Big Data, however. It is about employing the most effective means of supporting enterprise data activity as the world economy, and human culture itself, becomes more digitized.
The cloud is proving highly adept at finding the sweet spot between cost and capability, and as it assumes greater responsibility for data, it will naturally provide the most efficient platform for analytics.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata, Carpathia and NetMagic.