At a combined DataWorks Summit/Hadoop Summit conference, Hortonworks today made available a streaming analytics capability with version 3.0 of Hortonworks DataFlow (HDF). In addition, Hortonworks announced that it will allow IT organizations to transfer software licenses between instances of its Hadoop distribution running on-premises and a public cloud.
Jamie Engesser, vice president of product management for Hortonworks, says rather than requiring IT organizations to write their own code to add a streaming analytics capability, Hortonworks is embedding that capability within HDF via a Streaming Analytics Manager (SAM) that enables business analysts to build a streaming analytics application without having to write any code. Hortonworks is also providing access to a shared repository of schemas that allows applications to flexibly interact with each other across multiple streaming engines, including open source Apache Kafka, Apache Storm and Apache NiFi software projects.
The goal, says Engesser, is to make it simpler for organizations to build streaming analytics applications without necessarily having to employ the talents of a professional application developer.
Meanwhile, a new Hortonworks Flexible Licensing option is intended to make it simpler for IT organizations to transfer a Hadoop subscription license being applied on-premises to the cloud or vice versa. Engesser says about 25 percent of the company’s customers have deployed the Hortonworks distribution of Hadoop on a public cloud.
The decision as to where to deploy Hadoop is largely driven by where most of the data being generated by an organization occurs. Rather than trying to push all the data an organization has into one central Big Data lake, Engesser says the amount of data being generated in one location versus another now influences Hadoop deployments. In fact, Engesser says, it’s becoming more common for organizations to deploy multiple instances of Hadoop on-premises and in the cloud.
“We’re starting to see a lot more effects from data gravity,” says Engesser.
There’s no doubt at this point that Big Data platforms have become a mainstream element of enterprise IT. The challenge facing IT organizations is how best to manage multiple Big Data platforms that are increasingly being strewn across the enterprise. That may take some additional time to ultimately resolve. In the meantime, IT organizations should spend time planning for a much more distributed approach to Big Data that many of them may not have initially counted on having to manage.