Big Data implementations are invariably built around Hadoop, Apache Spark and other open source solutions. And since these constructs must integrate into the broader enterprise data ecosystem at some point, is it possible that open source will come to rule the data center as a matter of course?
The idea might not be as outlandish as it sounds. As business models across multiple industries come to rely on the insights gleaned from predictive analytics and broadly federated data infrastructure, proprietary systems may emerge as more a help than a hindrance. And while open systems tend to require quite a bit of in-house knowledge for both provisioning and management functions, many of these tasks are likely to be automated in the coming decade, providing for more user- and enterprise-friendly environments.
Already, many of the proprietary stalwarts of the past decades are embracing open source. No less than Microsoft, which once characterized Linux as a “cancer,” is seeing the light, says InformationWeek’s Jessica Davis. The company is planning to launch a Linux version of SQL Server sometime next year, a decision based on the fact that the platform is being passed over by some enterprises because it did not tie it directly to their growing fleets of Linux machines. At the same time, many traditional IT vendors are seeing the benefits of the broad collaboration and free-wheeling development that exists within open source communities.
Microsoft is also starting to pour resources into Hadoop, Spark and other Big Data platforms. As noted on TechCrunch, Microsoft announced recently that the on-premises version of the R Server is now powered by Spark, and will launch a preview of the Azure version later this summer. In the meantime, Hortonworks users can now access managed Spark services through Azure HDInsight, which is essentially Microsoft’s cloud-based Apache Hadoop distribution. And the company’s Power BI suite also supports Spark Streaming for real-time data ingestion.
Technology consultants like India’s Wipro are also opening up their Big Data platforms. The company recently made the move with its Big Data Ready Enterprise (BDRE) suite so as to streamline the implementation process and provide a more integrated data framework. The company recently commissioned a study of Oxford Economics surveys that concluded 64 percent of enterprise executives believe Big Data deployments will be driven by open source communities, producing not only a wider range of innovative solutions but faster development as well.
All open source distributions are not the same, of course. And already, rivals to Apache Spark and other Big Data platforms are taking shape. Concord.io, for one, says it has a stream-based processing solution that out-performs Apache Storm and Spark Streaming, using C++ to provide a 10-fold improvement in message throughput and driving per-event latency to the sub-millisecond level. As explained to TechRepublic’s Matt Asay, this makes it more appropriate for real-time services and applications rather than simply streaming out ETL batch jobs, while at the same time enabling high availability even during application deployment, updating and debugging.
It is unlikely that enterprises will start shedding their installed IT bases for open source solutions just because Big Data infrastructure will be open. But as refresh cycles progress, more and more of the data load will find its way to virtual, distributed architectures that will also incorporate Big Data to a large extent.
If the enterprise truly desires a unified, end-to-end data infrastructure, it will have to build as much commonality as possible across disparate platforms. And without open source, the only way to do that is through a single vendor that can satisfy all your data requirements.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.