How Computers Really Manage Unstructured Data

Loraine Lawson
Slide Show

The Business Impact of Big Data

Many business executives want more information than ever, even though they're already drowning in it.

One of the more confusing points, for me, has been how this whole Big Data thing fits together. In particular, I'm confused by solutions that purport to take unstructured data and make it useful within a structured database or BI system.


That's because it doesn't - at least, not in the way you think, explains Bill Franks, chief analytics officer of Teradata's Global Alliance Programmer and a faculty member at the International Institute for Analytics.


"Unstructured data may be an input to an analytic process, but when it comes time to do any actual analysis, the unstructured data isn't utilized," Franks writes in a recent post for the IIA.


He uses an example of trying to match a fingerprint. Despite what Hollywood tells us, computers don't actually compare and match fingerprints - they actually identify a certain set of key points on a print. That makes a map or a polygon, which is then compared to similar points in other prints.


In other words: What's being analyzed is a shape that is not only smaller in size than a fingerprint, but also fully structured, explains Frank.


So, what you're really using when you tap into unstructured data isn't what's unstructured, but rather, the structured information you've extracted from it. It's a nuance, he says, but an important one.


"This makes the information much easier to incorporate into analytic processes and standard tools than most people think," he writes. "For this reason, the thought of using unstructured data really shouldn't intimidate people as much as it often does."


But, oh, that first step's a doosie, as Bugs Bunny would say. The creation of a framework or model for identifying patterns is the trick.


Teradata recently made trade press headlines for its partnership with Hortonworks to offer an integrated solution that allows Teradata customers to use Hadoop data with Teradata's Aster, the company's analytics data solution. Gartner analyst Merv Adrian said it's one example of the sort of integration deals relational database vendors are making to ensure we don't wind up with even more fragmentation and data silos.


One thing that's unique about Teradata's approach is Aster uses the MapReduce framework - which is also used by Hadoop - on top of a relational database. Steve Wooledge, senior director of marketing at Teradata Aster, explained by email why that's important:

Hadoop is better for loading and batch transformations of data because Hadoop has MapReduce on top of a file system. Aster implemented the MapReduce framework on a relational database, which supports both native SQL and MapReduce on one platform-providing fast, interactive analytic processing.
No one else in the market has this. Only Aster can unify the analytic processing of structured and unstructured data with both SQL and MapReduce natively on one analytic platform. We use Hadoop as a landing/refining place to stage data for analytics where it makes sense."

Teradata, with its Big Data options, seems like it would be more of a competitor with Hadoop than a partner, but it recognizes that many companies are already operating both Hadoop and Teradata, according to GigaOm's Derrick Harris. Rather smartly, it's decided to not just "play nice," but embrace Hadoop, writes Harris.


"This type of partnership exists because, despite all the hype surrounding Hadoop as the linchpin of any big data strategy, it can still be very difficult to get started with the technology," Harris wrote.


But Teradata isn't just bringing a connector to the Hadoop ball - Aster includes more than pre-packaged analytic functions, which rely on MapReduce for processing Big Data, Wooledge explains.


"These solve specific business problems like doing path and pattern recognition for user behavioral analysis across your website or multiple digital channels, which helps companies better segment and target their customers. It is also used in fraud detection," he said.


Teradata actually has a patent on the intellectual property involved in the SQL-MapReduce connection, he added.


"The benefit is you have this power through standard BI tools and SQL rather than hiring highly-specialized engineers to code and then re-code in Java each time the scope of analysis changes," Wooledge stated. "We also have a visual integrated development environment (IDE) so analysts with minimal knowledge can create custom analytic functions, test locally on their desktop and then then deploy to the Aster cluster."

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Mar 28, 2012 12:06 PM Peter J Jamack Peter J Jamack  says:

Technically Greenplum UAP does the same thing, but uses MapR instead of Hortonworks distribution of Hadoop.   Plus there are other products that are trying to do the structured vs unstructured all in one platform.

So you're article isn't exactly honest in saying "this is the only platform that combines unstructured and structured"


Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.