Vendors Align to Drive Apache Spark Adoption

Five Ways to Scale Agile for the Enterprise

As an in-memory cluster extension to Hadoop, the Apache Spark project has been gaining a fair amount of momentum as a vehicle for running a variety of real-time applications. This week, an effort to standardize how Apache Spark is implemented was announced at the Spark Summit 2014 conference by Cloudera, Databricks, IBM, Intel, and MapR Technologies.

Justin Erickson, director of product management at Cloudera, says vendors will still compete in terms of how Apache Spark is used as a programming tool to build applications, but by agreeing to standardize the lower-level functions, the vendors are making sure those applications can be ported across different implementations.

Hadoop was originally built as a framework for processing massive amounts of data in batch mode using a relatively arcane MapReduce language. As an alternative to MapReduce, Spark allows certain classes of Hadoop applications to run much faster in memory.

Apache Spark is already being used as the basis for several projects that span everything from streaming for continuous data processing to graph analytics and machine learning. In addition, various other programming environments, including Crunch, Mahout and Cascading from Concurrent now support Spark.

The vendor alliance also announced this week that it will collectively work to move the Apache Hive SQL engine to Spark to improve the performance of SQL applications running directly against Hadoop. The group is investigating ways to adapt Apache Pig, Sqoop and Search to utilize Apache Spark as well.

With massive amounts of data already in Hadoop, it often makes more sense to process applications in the same environment than it does to try and move a huge amount of data to be processed elsewhere.

Erickson says that while Apache Spark represents one extension to Hadoop, Cloudera will continue to invest in other projects such as its Impala, which is an effort to optimize SQL application performance on Hadoop.

As Hadoop continues to evolve into a data hub, the programming environments surrounding it are starting to proliferate. Rather than thinking of it as a batch processing engine, it’s clear that an array of programming tools will enable Hadoop to be employed across a broad range of applications. In fact, the only limitation when it comes to Hadoop may soon be the imagination of the developers using it more so than how the underlying data is actually stored.

5G and Industrial Automation: Practical Use Cases

Is 5G Enough to Boost the Metaverse?

Building a Private 5G Network for Your Business

5G and AI: Ushering in New Tech Innovation

The Role of 5G in the Sustainability Fight

Vendors Align to Drive Apache Spark Adoption

Get the Free Newsletter!

Latest Articles

How DeFi is Reshaping the Future of Finance

Enterprise Software Startups: What It Takes To Get VC Funding

Top RPA Tools 2022: Robotic Process Automation Software

Advertisers

Menu

Our Brands