Data Lakes: 8 Enterprise Data Management Requirements

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12
Next Data Lakes: 8 Enterprise Data Management Requirements-5 Next

Discovery and Preparation

Due to the flexibility of data formats in Hadoop and other data lake backend storage platforms, it is common to dump data into the lake before fully understanding the schema of the data. In fact, a lot of lake data may be highly unstructured. In any case, the cost effectiveness of Hadoop data makes it possible to prepare the data after it has been acquired. This is more ELT (extract, load, transform) than traditional ETL (extract, transform, load). However, there is a point at which to do useful work with a data set, the format of the data must be understood.

In the open source ecosystem, discovery and preparation can be done at the command line with scripting languages, such as Python and Pig. Ultimately, native MapReduce jobs, Pig or Hive can be used to extract useful data out of semi-structured data. This new, accessible data can be used by further analytic queries or machine-learning algorithms. In addition, the prepared data can be delivered to traditional relational databases so that conventional business intelligence tools can directly query it.

Commercial offerings in the data discovery and basic data preparation space offer web-based interfaces (although some are basic on-premise tools for so-called "data blending") for investigating raw data and then devising strategies for cleansing and pulling out relevant data. Such commercial tools range from "lightweight" spreadsheet-like interfaces to heuristic-based analysis interfaces that help guide data discovery and extraction.

2016 is the year of the data lake. It will surround and, in some cases, drown the data warehouse, and we'll see significant technology innovations, methodologies and reference architectures that turn the promise of broader data access and Big Data insights into a reality. But Big Data solutions must mature and go beyond the role of being primarily developer tools for highly skilled programmers. The enterprise data lake will allow organizations to track, manage and leverage data they've never had access to in the past. New data management strategies are already leading to more predictive and prescriptive analytics that are driving improved customer-service experiences, cost savings and an overall competitive advantage when there is the right alignment with key business initiatives.

So whether your enterprise data warehouse is on life support or moving into maintenance mode, it will most likely continue to do what it's good at for the time being: operational and historical reporting and analysis (a.k.a. rear-view mirror).

As you consider adopting an enterprise data lake strategy to manage more dynamic, poly-structured data, your data integration strategy must also evolve to handle the new requirements. Thinking that you can simply hire more developers to write code or rely on your legacy rows-and-columns-centric tools is a recipe to sink in a data swamp instead of swimming in a data lake. In this slideshow, Craig Stewart, VP product management at SnapLogic, has identified eight enterprise data management requirements that must be addressed in order to get maximum value from your Big Data technology investments.


Related Topics : APC, Resellers, Data Replication, Extract Transform and Load, Structured Data Integration

More Slideshows

mobile87-190x128.jpg How to Find Business Value in Your Data Through Modernization

Data only becomes a meaningful and valuable asset when organizations can transform it into actionable insights. ...  More >>

LiaisonTechUncontrolledData0x 5 Steps to Wrangle Uncontrolled Data Flow

As the availability of data exponentially increases, unprecedented opportunities exist to do all kinds of amazing things, but these opportunities also come with data wrangling challenges. ...  More >>

Misc70-190x128.jpg 5 Data Warehouse Design Mistakes to Avoid

If you are designing a data warehouse, you need to map out all the areas where there is a potential for your project to fail, before you begin. ...  More >>

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.