Now that Hadoop is becoming a mainstream element of the enterprise, IT organizations need to come to terms with how to both manage and secure it. At the Hadoop Summit Europe 2016 conference today, Hortonworks unveiled a spate of updates that address how to apply security policies and maintain data governance to simplify the provisioning clusters in hybrid clouds and make it easier to interactively explore data in Hadoop.
In addition, Hortonworks and Pivotal, a unit of EMC, have announced that both companies will now standardize on the distribution of Hadoop made available by Hortonworks and that Hortonworks will resell extract, transform and load (ETL) tools developed by Syncsort.
Via integration with an open source Apache Ranger project, IT organizations will now be able to apply security policies that can limit who sees what data stored in Hadoop. Meanwhile, an Apache Atlas project makes it possible to classify and assign metadata tags, which are then enforced via Apache Ranger. Support for Ranger and Atlas is being provided at the moment as a technology preview. Hortonworks today also announced an Apache Metron project, which seeks to create an open source security information event management (SIEM) project on top of Hadoop.
In terms of making it simpler to provision Hadoop, Hortonworks is making available version 1.2 of Cloudbreak, which adds support for OpenStack and Windows Azure Storage Blob (WASB) for Microsoft Azure to the Hadoop provisioning tool. Hortonworks is also previewing a future update to the Apache Ambari performance tracking tool that will include pre-built dashboards for HDFS, YARN, Hive and HBase.
Finally, Hortonworks is making available a technical preview of Apache Zeppelin, a tool that analysts and data scientists can use to interactively explore data and run sophisticated data analytics directly within a web browser.
Matt Morgan, vice president of product and alliances for Hortonworks, says all these tools are intended to make it simpler for organization to employ Hadoop as a data lake from which both new and legacy applications can share access to massive amounts of data. Of course, to make that occur throughout the enterprise, IT organizations need to be able to secure data and account for who has access to it.
In the meantime, IT organizations would be well-advised to determine what distribution of Hadoop they ultimately plan to standardize on at a time when the platform as a whole is rapidly becoming enterprise class.