SHARE
Facebook X Pinterest WhatsApp

Tackling the Unstructured Data in Big Data

Four Steps to Ensure Your Big Data Investment Pays Off There’s a lot of talk about Big Data as if it is one entity. We hear: How do you manage Big Data? How do you govern Big Data? What’s the ROI for Big Data? The problem with this is that it puts too much focus […]

Written By
thumbnail
Loraine Lawson
Loraine Lawson
Jan 12, 2015
Slide Show

Four Steps to Ensure Your Big Data Investment Pays Off

There’s a lot of talk about Big Data as if it is one entity. We hear: How do you manage Big Data? How do you govern Big Data? What’s the ROI for Big Data? The problem with this is that it puts too much focus on the technology, while obscuring one of the major challenges in Big Data sets: the unstructured data. 

I suspect CIOs haven’t forgotten that component since about 80 percent of data in organizations today is unstructured data, according to Gartner. That’s a lot of value currently hiding in social media, customer call transcripts, emails and other text-based or image-based files.

That’s a problem, because that also happens to be where you may find the real value in Big Data. These disparate data sets were previously unanalyzed or sitting in application silos. Obviously, Hadoop will let you migrate that into one location, but what then? How do you turn that into valuable information?

This recent Datamation column by Salil Godika goes a long way toward answering these questions. Godika is the chief strategy & marketing officer and Industry Group head at Happiest Minds. I admit this gave me pause, because pieces by chief marketing officers can be too self-serving.

But I give kudos to Godika for proving my misgivings amiss. He’s written a great piece on dealing with unstructured data, even breaking it down into nine manageable byte (get it?) sizes. Unlike other Big Data how-to articles, he’s put the focus on the data, not the technology.

He does recommend creating a data lake, which is actually pretty controversial in the analyst world. Gartner has done a good job of outlining the cons with data lakes, which include the fact that data lakes don’t have a set definition and a lot of vendor hype.

That’s definitely worth knowing, but I’m not sure it’s relevant here, since Godika isn’t suggesting you dump everything into a Hadoop data lake. In fact, he does just the opposite by requiring you to pinpoint what’s relevant in the first two steps.

“If the information being analyzed is only tangentially related to the topic at hand, it should be set aside,” Godika writes. “Instead, only use information sources that are absolutely relevant.”

That may seem like obvious advice, but given that 80 percent of enterprise data is unstructured data and how excited IT can get about technology projects… Well, I suspect it’s advice you’ll need to emphasize early and often.

Data Analytics

Godika also doesn’t dwell on the technology specifics, but that’s actually what I like about it. There’s just no shortage of articles about Big Data technologies you can find. By skimming that, he’s free to focus on a much under-discussed aspect of Big Data, which is how to structure (excuse the pun) the data part of the project.

Hopefully, we’ll see more on this topic as Big Data transitions from the sandbox to part of the real data architecture. Recent research shows that this should happen this year. Deutsche Bank interviewed 26 CIOs at global companies and they report that they are now more comfortable with Hadoop in particular, and foresee the technology as playing a “significant part of the future data architecture.” The Wall Street Journal includes the details of Deutsche Bank’s research and notes that Gartner says approximately 1,000 companies currently use Hadoop in production.

Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

Recommended for you...

Enterprise Software Startups: What It Takes To Get VC Funding
Tom Taulli
Aug 25, 2022
Top RPA Tools 2022: Robotic Process Automation Software
Jenn Fulmer
Aug 24, 2022
Metaverse’s Biggest Potential Is In Enterprises
Tom Taulli
Aug 18, 2022
The Value of the Metaverse for Small Businesses
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.