Definitions: Unstructured Data Integration
Created on: Jan 27, 2009 11:40 AM by Loraine Lawson - Last Modified: Mar 30, 2009 10:48 AM by Loraine Lawson
Definition
Unstructured data refers to data stored as text or rich media (bitmap) objects.
The opposite of unstructured data is structured data, although more recently, some analysts have begun to identify a third type of data: semi-structured data, which include more official Word ocuments, spreadsheets and other office suite documents.
Business applications and concerns
Eighty percent of all enterprise information is stored as unstructured data. Given that e-mail, Web logs, call center records, Word documents and spreadsheets are all “unstructured data,” it's easy to see how executives and managers would benefit from being able to reliably access and query this information.
A 2008 report by the Aberdeen Group showed best-in-class companies who integrate unstructured data reported:
- Better response time to customer demand.
- Improved employee productivity.
- Reduced risks of harmful events.
- Better insight into customers than their counterparts.
Best-in-class companies also reported that reducing risks by preventing harmful events and increasing employee productivity were the top drivers for pursuing integration of unstructured data.
In recent years, regulatory compliance and data-security issues have forced many companies to act on the problem of unstructured data.
The big challenge with unstructured data is to integrate it with more formal, structured data. For instance, very little unstructured data can be accessed by existing business intelligence tools. If BI tools could draw from both types of data, leaders would gain better insight into the business.
Deployment Options
There are a range of options for finding, storing and accessing unstructured data. Enterprise search tools, enterprise content-management systems, text mining and analytic tools and intranets are among the solutions companies use to organize unstructured data.
BPM tools have also been used to “bridge the gap” between structured and unstructured data. Geoffrey Weglarz, a veteran of relational database technologies, multidimensional database technologies and linguistics, pointed out three specific situations where BPM had been used to marry unstructured data with structured data in this 2004 DM Review.
In the past two years, text analytics tools have entered the data-integration market. Philip Russom, an analyst for The Data Warehousing Institute, explained in this IT Business Edge interview, that these solutions can analyze natural language and mine it for data that can be imported into database records. Pureplay vendors include Attensity, ClearForest, Clarabridge. Some search tools also include text analytic capabilities, including Inxight, FAST and Endeca.
Colin White, the founder of BI Research, wrote in 2008 that the three main tools for integrating structured data - data federation, data consolidation and data propagation – could also be applied to unstructured data. Unstructured data would require an additional step of transforming the necessary business information into a semi-structured format, such as XML, or a structured format. He explained the challenges to this approach and outlined possible solutions in this bEye Network article.
Emerging Solutions
There also is an emerging discipline –information management - devoted to the problem of integrating structured and unstructured data. A Computer Weekly article examined this emerging field, as well as existing integration options and solutions on the horizon.
Another emerging option is the use of semantic technologies to integrate unstructured data.
Related Knowledge Network Content
There are no comments on this document

Strategic IT Planning & Governance Best Practices Guide
Use this guide — along with the more than 60 templates included — to ensure the overall success of your entire IT department.

ITIL V3 Foundation - Complete Certification Kit
Enhance your IT career by getting your ITIL Foundation Certificate. It's fast and easy with this complete resource. The 186-page eBook and companion online training course is guaranteed to help you pass the ITIL exam.





