SHARE
Facebook X Pinterest WhatsApp

Pentaho Leverages Metadata to Accelerate Big Data Ingestion

5 Requirements for Effective Self-Service Data Preparation While there’s a lot of enthusiasm for all things relating to Big Data, the hiring of a data scientist can be a frustrating experience for all concerned. All too often, data scientists wind up spending more time cleaning data than actually analyzing anything. To overcome that challenge, Pentaho, […]

Written By
MV
Mike Vizard
Apr 13, 2016
Slide Show

5 Requirements for Effective Self-Service Data Preparation

While there’s a lot of enthusiasm for all things relating to Big Data, the hiring of a data scientist can be a frustrating experience for all concerned. All too often, data scientists wind up spending more time cleaning data than actually analyzing anything.

To overcome that challenge, Pentaho, a unit of Hitachi, released today an update to its namesake analytics application that adds support for metadata injection that makes it easier to ingest and transform large amounts of similar data. Version 6.1 of Pentaho, says Chuck Yarbrough, director of Pentaho solutions, makes use of that metadata to identify patterns in data sources. That metadata information is then shared with the Pentaho Data Integration engine at run time to dramatically accelerate the extract, transform and load (ETL) process.

Given the repetitive nature of onboarding data, Yarbrough says IT departments are increasingly being asked to offload this process from data scientists who usually command six-figure salaries. Unfortunately, most organizations are not going to recoup their investment in those data scientists until the data pipeline on which they depend becomes more automated. To help facilitate that process, version 6.1 of Pentaho also now includes a series of blueprints that IT organizations can follow to enable self-service data ingestion that doesn’t require someone in the IT department to be involved in every data ingestion process.

Other enhancements in version 6.1 include the ability to create virtual data sets across a wider number of data blends and the ability to automatically model and publish analytic data. In addition, Yarbrough notes Pentaho has made it easier to collaboratively share metrics across a broader number of users.

Data scientists may be the rock stars of IT for the moment. But right now, they also represent one of the most expensive IT tickets in town. For that reason, every minute a data scientist or analyst spends on what amounts to data maintenance work winds up costing the business not only a lot of money in manual labor, but an increase in the time required to gain any actionable insight from all the data being collected in the first place.

MV

Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

Recommended for you...

Observability: Why It’s a Red Hot Tech Term
Tom Taulli
Jul 19, 2022
Top GRC Platforms & Tools in 2022
Jira vs. ServiceNow: Features, Pricing, and Comparison
Surajdeep Singh
Jun 17, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.