Newsletters Welcome, Guest Log In | Register


Join the Community

Exchange

Get full access to our community's expertise and resources.

Register Now >

Currently Being Moderated

Definitions: Unstructured Data Integration

0

Created on: Jan 27, 2009 11:40 AM by Loraine Lawson - Last Modified:  Mar 30, 2009 10:48 AM by Loraine Lawson

Definition

Unstructured data refers to data stored as text or rich media (bitmap) objects.

 

The opposite of unstructured data is structured data, although more recently, some analysts have begun to identify a third type of data: semi-structured data, which include more official Word ocuments, spreadsheets and other office suite documents.

 

Business applications and concerns

Eighty percent of all enterprise information is stored as unstructured data. Given that e-mail, Web logs, call center records, Word documents and spreadsheets are all “unstructured data,” it's easy to see how executives and managers would benefit from being able to reliably access and query this information.

 

A  2008 report by the Aberdeen Group showed best-in-class companies who integrate unstructured data reported:

  1. Better response time to customer demand.
  2. Improved employee productivity.
  3. Reduced risks of harmful events.
  4. Better insight into customers than their counterparts.

 

Best-in-class companies also reported that reducing risks by  preventing harmful events and increasing employee productivity were the top  drivers for pursuing integration of unstructured data.

 

In recent years, regulatory  compliance and data-security issues have forced many companies to act on the  problem of unstructured data.

 

The big challenge with unstructured data is to integrate it with more formal, structured data. For instance, very little unstructured data can be  accessed by existing business intelligence tools. If BI tools could draw  from both types of data, leaders would gain better insight into the business.

 

Deployment Options

There are a range of options for finding, storing and accessing unstructured data. Enterprise  search tools, enterprise content-management systems, text mining and analytic tools and intranets are among the solutions companies use to organize  unstructured data.

 

BPM tools have also been used to “bridge the gap” between structured and unstructured data. Geoffrey Weglarz, a veteran of relational database technologies, multidimensional database technologies and linguistics,  pointed out three specific situations where BPM had been used to marry  unstructured data with structured data in this 2004 DM Review.

 

In the past two years, text analytics tools have entered the data-integration market. Philip Russom, an analyst for The Data Warehousing Institute, explained in this IT Business Edge interview, that these solutions can analyze natural language and mine it for data that can be imported into database records. Pureplay vendors include Attensity, ClearForest, Clarabridge. Some search tools also  include text analytic capabilities, including Inxight, FAST and Endeca.

 

Colin White, the founder of BI Research, wrote in 2008 that the three main tools for integrating structured data - data federation, data consolidation and data propagation – could also be applied to unstructured data. Unstructured data would require an additional step of transforming the  necessary business information into a semi-structured format, such as XML, or a structured format. He explained the challenges to this approach and outlined possible solutions in this bEye Network article.

 

Emerging Solutions

There also is an emerging discipline –information management - devoted to the problem of integrating structured and unstructured data. A Computer Weekly article examined this emerging field, as well as existing integration options and solutions on the horizon.

 

Another emerging option is the use of semantic technologies to integrate unstructured data.

 

Related Knowledge Network Content

 

Average User Rating
(0 ratings)




Add a comment Leave some feedback about this document.

There are no comments on this document

Strategic IT Planning & Governance Best Practices Guide

Use this guide — along with the more than 60 templates included — to ensure the overall success of your entire IT department.

Learn more >

ITIL V3 Foundation - Complete Certification Kit

Enhance your IT career by getting your ITIL Foundation Certificate. It's fast and easy with this complete resource. The 186-page eBook and companion online training course is guaranteed to help you pass the ITIL exam.

Learn more >

Information Management

Tools, tips and solutions to help you manage your data more efficiently to tackle today's challenging economic environment.

Data Deduplication

Data manipulation strategies that make data stores more manageable and reduce the need for storage capacity and its associated costs.

Data Management Solutions

Data management and storage solutions, tips and best practices to improve the scalability, reliability, and accessability of your data.

Mobile Computing Optimization

Mobile computing solutions, tips, and expert commentary that increases the usability and bottom-line benefits of your mobile computing assets.

Lowering Your IT Costs with Oracle Database 11g Release 2

This white paper identifies the key capabilities a database management solution needs to successfully deliver more information with higher quality of service, make more efficient use of IT budgets, and reduce the risk of change in data centers.

Software Forum: Information On Demand Virtual Experience

This interactive virtual forum presents leading IT experts providing the insights you need to turn your information into a strategic driver for innovation, business optimization and competitive differentiation.