Okay, sure, maybe Gartner has a point about this whole “data lake becoming a data swamp” problem. But a recent Information Age piece proposes that organizations can get around all that — and the need for data scientists — with a “data refinery layer.”
Haven’t heard of such a thing? Neither have I, and Google seems to only have heard of it twice, including this article and an unsourced Word document.
“As data is consolidated, the refinement layer would process, evaluate, correlate and learn from the information passing through it, essentially generating additional insights and information from the data, and also linking to the aforementioned applications to drive value,” the article explains.
That sounds wonderful. Let’s do it! The problem is, after reading the article, I’m still not exactly sure what it is or if it exists or if it could exist.
This piece says it’s by Ben Rossi, but if you read to the bottom it’s “sourced from Matti Aksela, Comptel.” That’s a niche company that focuses on building customer interaction automation systems for telecoms. Apparently, refinery data layers won’t be on discount during today’s sales.
So why am I sharing this? I’ve been researching data lakes and talking with numerous experts (more on that another day), and I realize there’s actually a really good point here.
While there’s some heated debate over the usability and merit of data lakes, large companies are building them for legitimate use cases. Sensor and other Internet of Things data will need to go somewhere, and there are already successful use cases for network intrusion detection and security.
So the real question is how we make them useful more broadly and, ultimately, that’s going to require abstraction layers. One of the benefits of a data lake is supposed to be less data integration work, but wherever there are layers, there seems to be middleware.
We’ll also probably hear a lot of different names for the same tools along the way. That’s just how tech happens.
If data tech history teaches us anything, we can expect that industry-specific vendors will pioneer the first drafts of these tools. So, the piece is worth a read to see what technologists are thinking as they try to solve this problem.
In the meantime, a more practical read for the weekend might be Forbes’ recent article, “3 Major Mistakes Companies Make With Big Data And How To Fix Them.” It’s written by Erik Severinghaus, founder and CEO of digital marketing personalization company SimpleRelevance. Severinghaus discusses more immediate solutions for squeezing business value from Big Data, including the four roles you must have on your Big Data team.
Webinars and Events:
“Postgres – The NoSQL Cake You Can Eat,” Tuesday, Dec. 2, at 2 p.m. ET. Do you have to use NoSQL to achieve goals like managing transactional system data? This webinar discusses an alternative: Postgres, aka PostgreSQL, a object-relational database management system (ORDBMS). Marc Linster, SVP, Products & Services at EnterpriseDB, will discuss using ETL, foreign data wrappers and other techniques you can use with this open source solution.
“When Is a Document-Oriented Database the Right Tool for You?” Tuesday, Dec. 2, at 4 p.m. ET. Next week’s The Briefing Room with Dr. Robin Bloor will dig into scalability challenges with application scale and databases. He’ll be briefed by Cloudant Chief Scientist Mike Miller, who will demonstrate denormalizing data into documents for better data management across distributed infrastructure.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.