SHARE
Facebook X Pinterest WhatsApp

Pentaho Adds Support for Apache Spark

Five Ways Automation Speeds Up Big Data Deployments Looking to extend the reach of its business intelligence across the realm of Big Data, Pentaho today announced data integration capabilities with the Apache Spark in-memory framework for processing Big Data queries. Donna Prlich, vice president of product marketing for Pentaho, says the initial use cases for […]

Written By
MV
Mike Vizard
May 12, 2015
Slide Show

Five Ways Automation Speeds Up Big Data Deployments

Looking to extend the reach of its business intelligence across the realm of Big Data, Pentaho today announced data integration capabilities with the Apache Spark in-memory framework for processing Big Data queries.

Donna Prlich, vice president of product marketing for Pentaho, says the initial use cases for Pentaho include being able to orchestrate jobs running on Apache Spark.

Longer term, Prlich says Pentaho is also exploring additional use cases that would enable users of Pentaho software to be able to mingle a variety of types of data processed using Apache Spark alongside a wide variety of other Big Data engines and formats.

The ultimate goal, says Prlich, is to turn Pentaho BI software into a lens through which end users will be able to analyze trends across multiple sources of Big Data without having to master arcane programming tools such as MapReduce.

Pentaho

While there is a lot of interest in Apache Spark software, the technology was originally designed to allow a single data scientist to manipulate data in memory. Now Prlich says Apache Spark proponents are hardening that technology to make it robust enough to deploy within multi-user applications running in production.

Because Apace Spark runs in memory, it’s a much faster alternative to MapReduce or a variety of SQL engines that can be deployed on top of Hadoop itself. But like most Big Data technologies, Apache Spark is still something of a work in progress that, in the fullness of time, will augment existing data warehouses by offloading queries against unstructured data from traditional relational and columnar databases that are optimized for processing more structured data.

MV

Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

Recommended for you...

Top RPA Tools 2022: Robotic Process Automation Software
Jenn Fulmer
Aug 24, 2022
Metaverse’s Biggest Potential Is In Enterprises
Tom Taulli
Aug 18, 2022
The Value of the Metaverse for Small Businesses
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.