Schism over Data Lakes: Open Source Vet Challenges Gartner’s Dire Take

    Slide Show

    Four Steps to Ensure Your Big Data Investment Pays Off

    Gartner solidified its argument against data lakes last week, raising concerns about data quality, governance and security, and even questioning whether dumping data into one repository would really resolve the data silo problem.

    As I shared in a previous post on data lakes, Gartner analyst Andrew White raised many of these same concerns in his post, “The Confusion and Hype related to Data Lakes.” Now, however, Gartner has elevated those concerns into a full report, “The Data Lake Fallacy: All Water and Little Substance” (registration required to download).

    The tech press is widely covering this announcement, but most of what I’ve read is just a rewrite of Gartner’s related press release.

    You can read it at the source, but here’s the short version: Gartner says a data lake might work for a highly skilled data scientist, but it renders data more or less useless for the rest of us. In fact, you’d be better off leaving it in the data warehouses from which it came than moving it all to a data lake.

    “Data lakes typically begin as ungoverned data stores,” Gartner Research Director Nick Heudecker states in the release. “Meeting the needs of wider audiences require curated repositories with governance, semantic consistency and access controls — elements already found in a data warehouse.”

    Gartner’s hitting pretty hard at some vendors, which it claims are hyping the concept as something any user can leverage.

    “Several vendors are marketing data lakes as an essential component to capitalize on Big Data opportunities, but there is little alignment between vendors about what comprises a data lake, or how to get value from it,” the press release notes.

    White, in his original post, said there was confusion in the marketplace about data lakes, and “vendors, some vendors, are (predictably) taking advantage of the confusion.”

    Frankly, that’s pretty straight talk for an IT research firm and, not surprisingly, it has attracted a gutsy rebuttal.

    “I didn’t want to write about ‘data lakes’ or Big Data projects this week because I wrote about them for the last two weeks. Then this nonsense was released by Gartner,” begins Andrew C. Oliver in an InfoWorld column.

    Oliver is the president and founder of the professional services firm Open Software Integrators, with offices in Chicago, Ill., and Durham, N.C. But more to the point, he was an early developer at JBoss and founded the POI project, now hosted at Apache. He’s also a former board member and now a helper at the Open Source Initiative.

    So, he’s not a lightweight, and he isn’t pulling punches in his column, writing that Gartner “attacks the concept of a data lake without offering any credible alternative.”

    “Instead, Gartner suggests you try even harder with data warehousing,” he adds. He even accuses Gartner of being on the wrong side of history. “The data lake strategy is part of a greater movement toward data liberalization,” Oliver says. And he thinks the firm is basing its assessment on “the intuitive meaning” of the term data lake.

    “Of course you can drown in a data lake! But that’s why you build safety nets like security procedures (for example, access is allowed only via Knox), documentation (what goes where in what directory and what roles you need to find it), and (yes, Gartner) governance,” Oliver writes. “Data lakes are based on new technology. This is a new methodology. Of course there’s a risk, but no real progress is ever made without taking some risk.”

    Well, you can’t disagree with that last statement, although it might give the cautious enterprise pause to take those risks.

    What’s unclear to me, though, is whether Oliver read Gartner’s full report (he calls it a FUD report) or based his argument on the press release. I noticed the first link he includes goes to the press release, which is why I raise the question. I asked him on Twitter, but alas, my deadline was upon me before I could receive an answer.

    Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

    Loraine Lawson
    Loraine Lawson
    Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

    Latest Articles