Internet of Things Data Poses New Challenges for Cloud-Based Predictive Analytics

Loraine Lawson
Slide Show

Four Steps to Ensure Your Big Data Investment Pays Off

As the Internet of Things comes online, it will almost certainly require changes to how IT manages data, according to Gartner analyst Joe Skorupa.

"The enormous number of devices, coupled with the sheer volume, velocity and structure of IoT data, creates challenges, particularly in the areas of security, data, storage management, servers and the data center network, as real-time business processes are at stake," Skorupa, vice president and distinguished analyst at Gartner, states. "Data center managers will need to deploy more forward-looking capacity management in these areas to be able to proactively meet the business priorities associated with IoT."

The highly distributed nature of the IoT will make it impractical to move all of the data to a central location for processing, Skorupa theorizes. Instead, data will be aggregated in “distributed mini data centers where initial processing can occur.” Only the business-relevant data would be sent to a central location for further processing, he added.

The problems become more complicated when you consider one major use case for IoT data: predictive analytics.

In a recent O’Reilly Radar column, Beau Cronin describes this as “data gravity.” Cronin says it’s the “widely recognized” idea that it makes more sense to bring the computation power to the data, rather than the data to a centralized processing spot, when dealing with large datasets.

Ahead of November’s International Conference on Predictive APIs and Apps in Barcelona, Cronin is exploring the problems with this approach.

And who is this Cronin fellow? He is the co-founder of predictive analytics company Prior Knowledge (PK), which was purchased by Salesforce in 2012. He also holds a PhD in computational neuroscience from MIT, where he researched probabilistic models of neuronal response. So you might say he’s got the chops to see the big picture here. And what he sees is that data gravity creates a problem for cloud-based analytics services because it conflicts with their basic architecture.

Cronin is exploring the intersection of machine learning, distributed data points (think, IoT) and predictive APIs. I gather the idea is to apply machine learning (delivered as a service) so that the service can provide the right data, to the right person, at the right time and place. In other words, it’s not just predictive APIs: It’s a smart service that learns as it goes to improve.

Cronin looks at this problem of data gravity and what it means for predictive APIs. What he foresees is a new approach that will “fully assimilate” predictive APIs into the “existing data science stack.”

“The most valuable toolsets will directly support (or at least not disrupt) the whole process, with machine learning and model building closely integrated into the overall flow,” he writes.

As you might expect, it’s a dense, although accessible, read that raises serious concerns about how we’ll manage data in the age of the IoT.


Earlier this week, I looked at how to develop a business-savvy API. Predictive APIs are an entirely different level of complications, it seems. For instance, predictive analytics requirements vary by audience, with predictive APIs for data analysts requiring different tooling than those designed for developers or even data scientists. The problem is, most predictive apps or data products will need to address all three, he writes.

Cronin predicts that this will force predictive API providers to specialize in specific vertical industries. That’s the opposite of what predictive API developers are doing now.

“At this point, elegant and general APIs become not only irrelevant, but a potential liability, as industry- and domain-specific feature engineering increases in importance and it becomes crucial to present results in the right parlance,” he writes. “Sadly, these activities are not thin adapters that can be slapped on at the end, but instead are ravenous time beasts that largely determine the perceived value of a predictive API.”

Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.

Add Comment      Leave a comment on this blog post
Nov 12, 2014 1:08 PM Pat Hennel Pat Hennel  says:
"Only the business-relevant data would be sent to a central location for further processing," I feel like there would be a lot of stop and go while this is being set up. How do you decide what is business-relevant and what is not? A lot of data would be stuck in the wrong place until someone figured out the correct paths to organize that information. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.