Recognizing that data governance concerns are holding back deployments of Hadoop in production environments, Hortonworks announced this week that it is partnering with Aetna, Merck, Target and SAS to launch the Data Governance Initiative (DGI).
Andrew Ahn, director of product management for governance at Hortonworks, says DGI will create a customizable metadata framework on top of Hadoop that IT organizations will be able to invoke using a rules-based policy engine.
In addition, DGI will create an audit store that will make it simpler to discover how data was employed in a particular context, while also working to integrate the metadata framework with existing Apache Falcon data life cycle management and Apache Ranger data security projects.
Ahn says DGI will also lay the groundwork for additional long-term initiatives, because once the metadata framework is established, it will become more feasible to, for example, expose particular data sets via an application programming interface (API) that can be invoked by analytics applications.
The most critical aspect of the DGI, says Ahn, is that each instance of the metadata framework can be customized to meet the specific needs of each customer. Rather than imposing a data governance framework from the top down, the DGI effort is intended to allow customers to create individual taxonomies to tag data in a way that can be consistently managed via a rules-based engine, says Ahn.
Ahn concedes that none of these ideas are new from a traditional data warehousing perspective; they just haven’t been applied yet to Hadoop. But what will be more interesting to watch is how broadly they are applied across the enterprise as Hadoop continues to emerge as a “data lake” that all applications wind up drinking from.