Metadata and Governance
Two areas that are still less mature in data lake technologies, such as Hadoop, are metadata and governance. Metadata refers to update and access requests as well as schema. These capabilities are provided in the context of the conventional relational data warehouse, where updates are more easily tracked and schema is more constrained.
Work in open source on metadata and governance is progressing, but there is not widespread agreement on a particular implementation. For example, Apache Sentry helps enforce role-based authorization to Hadoop data. It works with some, but not all, Hadoop tools.
Enterprises looking to better manage metadata and governance currently employ custom solutions or simply live with limited functionality in this regard. Recently, LinkedIn open sourced an internal tool called WhereHows that may prove to improve the ability to collect, discover, and understand metadata in the data lake. Look to see commercial data integration solution providers develop new ways to manage metadata and governance in the enterprise data lake.