Pentaho Unveils Visual Approach to Prepping Big Data

    Slide Show

    Data Lakes: 8 Enterprise Data Management Requirements

    As usage of Big Data with platforms such as Hadoop and Apache Spark becomes more mainstream, a clarification of the separation of duties between IT organizations and data scientists needs to emerge. IT operations teams should exert more control over data preparation, which in turn will free up the data scientist to spend more time analyzing data versus massaging it.

    With that construct in mind, Pentaho, a unit of Hitachi Corp., today announced Pentaho Business Analytics 7.0, which provides a set of visual tools that makes it simpler for IT operations teams to manage the flow of data within any given pipeline.

    Chuck Yarbrough, senior director of solutions marketing and management for Pentaho, says with this release of its analytics software, Pentaho is moving more of the data preparation process into a discrete set of functions that internal IT operation teams can use without having to master arcane extract, transform and load (ETL) tools.

    “We don’t think you need to have an ETL specialist,” says Yarbrough.

    Pentaho Business Analytics makes use of metadata injection techniques developed by Pentaho to make it possible to create a set of graphical tools for managing the data preparation process. The basic idea is to allow internal IT organizations to visually inspect each part of the Big Data preparation process without any help from a data scientist required, says Yarbrough.

    A lot of data scientists today spend far too much time on data plumbing issues. At a time when most data scientists earn six-figure salaries to create Big Data analytics applications, using them to perform data preparation and integration tasks is a gigantic waste of time and money. At the same time, it’s apparent that IT organizations now need access to data preparation tools that the average IT generalist can use to accomplish those tasks without necessarily having to master ETL tools that were primarily designed for another data management era.

    The challenge is making it possible for data scientists and IT operations teams to work together using a more hand-in-glove approach that ideally removes the data scientist as much as possible from the process of managing the flow of data in and out of any data lake.

    Mike Vizard
    Mike Vizard
    Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles