Big Data player Teradata is holding its annual PARTNERS Conference and Expo, happening this week in Nashville, Tennessee. Along with that came the usual pack of press releases and interviews promoting upgrades or new offerings. For Teradata, four main announcements came out of the conference:
- Data fabric, enabled by QueryGrid
- In-memory optimization and other improvements to its database
- Connection Analytics
- The new Data Integration Optimization services
I spoke with Dan Graham, Teradata’s general manager, Enterprise Systems, about the new changes with QueryGrid and what Teradata calls the data fabric. What I learned is that, behind all the tech talk, there’s actually a pretty basic story here that focuses on giving business users easier and faster access to all their data.
In case you’re unfamiliar with Teradata’s solutions, QueryGrid was developed for eBay to run queries in parallel on both Teradata’s machines and Hadoop with only Teradata’s tools. A unique aspect of QueryGrid is that it supports true parallel data exchange, which significantly speeds up queries, Graham explained last month.
The latest announcement is, oddly enough, that QueryGrid will now reach into other Teradata systems and optimize those queries, as well. That may seem a bit surprising, since companies generally integrate their own products first, but Graham said they focused on Hadoop because that’s where the demand was.
For instance, you could offload some of your “cold” business data to a Teradata Data Appliance 1700, rather than retain it on the main data warehouse. The update also supports connecting Teradata to Aster databases.
That may sound like something for the net admin team, but here’s why data techies should care: With the new QueryGrid, users can still run queries and get data from both systems “without any knowledge that there was actually this leap from one machine to the other and a significant parallel exchange of data,” Graham explained.
That’s what the company means by data fabric. It’s not an actual product, but rather the result of connecting multiple systems and “giving companies the flexibility to pick their file systems, operating systems, data types, analytic engines, and system design characteristics to meet their business needs,” the release states.
Another way this new release speeds things up and helps users is that it optimizes Hadoop queries. Hadoop may be amazing, but it’s also known for giving you precious little information about what’s in its stores. That can be a problem if, for instance, you have 100 KB of data, but you only need 2 percent of it for your query.
“A lot of this is done upfront with the database, but since Hadoop is not a database, the statistics it collects are quite minimal and we can’t really use them very much for query management yet,” Graham explained. “For example, they don’t have histograms on all the columns on all the fields on all the files. So we have do that as we get the data in real time because that’s the only way to get it.”
The optimizer helps adjust for that by doing what Teradata calls “incremental or adaptive planning,” he said. During the integration or ETL load, it processes that data, generating statistics about it and planning the fastest query. In short, it allows you to take what you need and leave the rest — which can result in a much faster query speed, he said. It all happens automatically, he added.
“This is completely invisible to customer, Hadoop administrator, even to the Teradata programmer,” Graham said. “You just go over to Tableau, tell it you want to do a query that involves pulling data from Hadoop. The data is pulled and the performance is optimized in real time.”
The company plans to also expand QueryGrid to support other platforms, such as Mondo DB.
So what does that mean for real businesses, in the real world? Graham shared how a manufacturer maintained separate systems for its engineers and business users. The two didn’t work together, because they didn’t need to— until the business people wanted to reach into Hadoop for sensor data, to join it with data from orders and operational systems.
“As they did this, they begin to realize there were like 10 more use cases, there were 10 more opportunities,” he said. “What was very interesting about this customer was not only all the use cases that popped out of the sensor data and the sharing of data, but the people actually started working together. They actually found common ground and became much more of a team and I think that surprised their database administrator a little bit.”
I’m sure it did. That database admin had stumbled upon integration’s best-kept secret: its ability to support and create alignment.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.