SHARE
Facebook X Pinterest WhatsApp

Three Surprising Reasons Why Businesses Are Building Data Lakes

2014 Big Data Outlook: Opportunities and Challenges The data experts might be confounded by the concept of data lakes, but apparently that isn’t stopping organizations from building them. A recent PWC feature article contends that data lakes are not only possible, but already exist in Hadoop-based repositories. The article highlights UC Irvine Medical Center’s data […]

Written By
thumbnail
Loraine Lawson
Loraine Lawson
Jun 30, 2014
Slide Show

2014 Big Data Outlook: Opportunities and Challenges

The data experts might be confounded by the concept of data lakes, but apparently that isn’t stopping organizations from building them. A recent PWC feature article contends that data lakes are not only possible, but already exist in Hadoop-based repositories.

The article highlights UC Irvine Medical Center’s data lake as one example. The health care company deployed the Hadoop-based data lake as a single place to store records — structured, semi-structured and unstructured data — for more than a million patients.

What UC Irvine and other data lakes have in common is that they use Hadoop to store data in its native format for later parsing. While you extract and load the data into Hadoop, you skip the “transform” step of ETL. This solves several problems:

  1. No integration work is required as there would be with a data warehouse. You’re simply extracting and loading the data — not transforming it.
  2. The data’s integrity and fidelity is maintained, so you can reuse it for different analysis in different contexts.
  3. The architecture is less expensive, less rigid and easier to modify than a relational data warehouse.

Really, this significantly changes data integration’s role in how companies handle and manage data. Essentially, it eliminates it as an initial bottleneck:

“Previous approaches to broad-based data integration have forced all users into a common predetermined schema, or data model. Unlike this monolithic view of a single enterprise-wide data model, the data lake relaxes standardization and defers modeling, resulting in a nearly unlimited potential for operational insight and data discovery.”

It also changes data integration requirements as the data is accessed. Data lakes require fewer integration steps because they don’t enforce a rigid metadata schema.

“Instead, data lakes support a concept known as late binding, or schema on read, in which users build custom schema into their queries,” the PWC article states. “Data is bound to a dynamic schema created upon query execution.”

What does that mean? It means integration will no longer require a massive project by the data warehouse teams and DBAs, but can be done by “localized teams of business analysts and data scientists…”

This actually makes metadata more important, because the more you know about it, the easier it is to build a query.

Sounds great, right? What’s the catch?

Well… it turns out that a lot of companies are struggling with leveraging the data once they’ve built the data lake. They dump everything in, and then kind of forget what they’ve put in there. In effect, they started a data lake and created a data graveyard, as the CTO of Cambridge Semantics so cleverly puts it.

The PWC article also explains how you can avoid that, of course. It supplies lots of reader-friendly graphics to explain data lakes and how they should function.

For more on data lakes, check out my other recent posts:

Recommended for you...

Enterprise Software Startups: What It Takes To Get VC Funding
Tom Taulli
Aug 25, 2022
Top RPA Tools 2022: Robotic Process Automation Software
Jenn Fulmer
Aug 24, 2022
Metaverse’s Biggest Potential Is In Enterprises
Tom Taulli
Aug 18, 2022
The Value of the Metaverse for Small Businesses
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.