How Data Virtualization Helps Make IoT Data Useful

Loraine Lawson
Slide Show

What Business Leaders Need to Know About Virtualization

The Internet of Things is among the trends driving companies to invest in data virtualization, according to Suresh Chandrasekaran, senior VP for data virtualization vendor Denodo.

Data virtualization isn’t normally something you hear in Big Data discussions. I asked Chandrasekaran what problem data virtualization solved for IoT and other Big Data projects. Sensor data is generally pooled in a data repository or data lake, he explained, but it’s useful without context.

Data virtualization allows you to leverage sensor and other Big Data and add context using other data sources. For instance, if you’re using sensors to monitor vehicles, you might want to combine that with maintenance records to predict when parts need to be changed.

“It’s blending that big data, IoT with contextual data and then making predictive analytics on when to do preventative maintenance, add-ons to contracts, things like that,” he said.

For instance, Climate Corporation sells parametric weather insurance to farmers. What makes this business model different from other insurance agencies is that it pays out for unpredictable or unusual weather events that affect crop yields. That involves deploying weather sensor data on every 2.5 square miles on farms, which generates more than 30 terabytes of data each month.

The weather data doesn’t yield insights alone. It has to be combined with other data sets — that’s the context Chandrasekaran mentioned — then run through predictive analytics to determine if the weather was unusual or extreme. If there was an event, then the next step is combining the data with crop yield data to determine how much the weather anomaly will change crop yields, Chandrasekaran explained.

"So if a particular storm moved through Kansas and it was considered to be an unpredictable or unusual weather event, they would calculate the yield loss and the farmer would get a check - no claims, no adjuster. That's why it's parametric,” Chandrasekaran said.

All this requires sifting through Big Data sets to find what’s useful data. Since it’s high-volume data, you don’t want to copy and load all of it into your analytics system. This is where data virtualization steps in, allowing you to search the data, apply data cleansing and other data quality checks on it, then combine it with other data sets without actually moving it. (Could this be the refinement layer of which some speak? I don’t know.) That saves on hardware costs, as SAS Best Practices Thought Leader Anne Buff pointed out.

Data Analytics

This use case is where data virtualization may finally shake its data federation roots. Despite the fact that data virtualization has evolved significantly over the past 10 years, it is still sometimes described as data federation or seen as an adjunct to ETL, he said. (It’s also confused with enterprise application integration, an idea countered by this All Analytics column.)

“There was such an entrenched mindset that everything had to be copied and persisted through ETL to do any kind of analytics without impacting source or performance,” he said. That’s no longer the case, he added. “The primary reason people are adopting data virtualization is less about real-time integration and more about abstraction and discovery of my enterprises data assets.”

If you’re curious about what that looks like, Denodo published a case study on Climate Corporation. The short paper shows the functions that data virtualization serves within Climate Corp’s BI platform architecture.

You might also want to check out these ITBE posts about the technology:

Understanding Data Virtualization

Curious About Data Virtualization? Tool Lets You Tinker for Free

Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

Add Comment      Leave a comment on this blog post
Feb 6, 2015 1:46 PM Kevin Petrie Kevin Petrie  says:
Loraine, thanks for this assessment. Big Data integration struggles, many derived from conventional ETL bottlenecks, are probably the greatest challenge faced by analytics practitioners. As you point out, one alternative is to analyze data where it resides, leveraging virtualization. Another alternative, which makes the centralized Data Lake option more feasible, is to use automated data integration software that eliminates manual ETL coding, loading scripts, and duplicate copies of data that doesn't change between loads. Very quickly enterprises can regain the necessary time to correlate previously unconnected data points to support business decisions. Hadoop can be an ideal target for all the necessary data points because it stores raw data from a variety of source formats. - Kevin Petrie, Senior Director, Attunity. @KevinPetrieTech Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.