Build for the Future: Integrating Unstructured Data Will Be a Challenge

Loraine Lawson

Do you know Gartner is predicting data will grow by 800 percent over the next five years?

More noteworthy is the prediction that 80 percent of this data will be unstructured — emails, texts, pictures, log data, social media data, XML files, videos, audio — those types of things.

That’s the prediction. It’s a bit intimidating when you consider the problems we already have with unstructured data.

One of the ultimate goals of information management is to take that unstructured data and integrate it with traditional, structured data. That’s not exactly something IT departments typically know how to do.

That needs to change, according to a recent TDWI Checklist Report, “Integrating Structured and Unstructured Data.”

“As more organizations evaluate the potential and scope of big data projects and understand the ramifications, there will be greater recognition that establishing a sound foundation for data integration is a critical factor in any information utilization strategy,” states David Loshin, the president of Knowledge Integrity, Inc. and author of the report. “The challenge of integrating structured and unstructured data will be a key factor for big data success.”

Loshin identifies seven steps you’ll need to take to manage and integrate unstructured data.

I’m not going to lie to you: This is not a simple checklist. Using unstructured data means making it meaningful. It’s not enough to be able to search for “Bob.” You have to know the context and how that impacts the meaning of “Bob.”

To do this, you’ll need what Loshin calls “meaning-based computing techniques,” and that will require layers of technology. You’ll need to be able to scan and parse text; add meta-tagging; and add concept tags (when is “Bob” Bob Newhart and when is it Billy Bob Thornton?). And, of course, all of this will need to be automated.


In addition to the technology issues, this kind of integration will require close collaboration with the business. For instance, you’ll need to establish a lexicon of key business terms and phrases, as well as establishing which context these terms and phrases are used.

Loshin offers a good dose of technical guidance in this checklist, but it seems vendor-neutral to me. HP is listed as the sponsor, and sometimes it’s hard to tell if a paper has been skewed to favor a vendor’s capabilities while downplaying weaknesses, but from what I’ve seen, that’s not an issue with TDWI reports. I’m sure a competitor will contest that statement if I’m wrong.

So check it out. Even if you know this is still a long way off for your organization, it’s smart to read it for an idea of what you need to build into your systems and data architecture over the next few years. Plus, it’s a free download, so why not?



Add Comment      Leave a comment on this blog post
Feb 22, 2013 10:10 AM Melissa Melissa  says:
Bob can easily be found through a data profile. Emails, files, attachments on multiple platforms can be searched easily and automatically, reports generated and then Bob's disposition can be uncovered. Ask Gartner about Data Profiling or check out http://www.indexengines.com/solutions_BusNeed_DM-DA.html The technology exists and it's time and cost effective. Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


 
Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making

SOA

SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data


Thanks for your registration, follow us on our social networks to keep up-to-date