These days, there are so many vendor announcements related to Hadoop, I’ve given up keeping track of them all. But one did catch my eye: Hortonworks is now offering Apache Hadoop distribution for Windows. And yes, it’s still open source.
That’s big news for just about every size company, since Windows server owns 73 percent of the server market, according to IDC.
The distribution, called Hortonworks Data Platform for Windows, is in beta release now and available for free download. The plan is to ensure it offers the same features as the Linux distribution.
But where this is really headed is supporting SQL queries against Hadoop data stores. That’s big news for Big Data, because there are certainly more programmers who write SQL queries than programmers who can write MapReduce. Way, way, way more programmers.
So while the distribution is big news, it’s not the first offering that supports SQL, according to GigaOm’s Derrick Harris.
“SQL support isn’t the end-game for Hadoop, but it’s the feature that will help Hadoop find its way into more places in more companies that understand the importance of next-generation analytics but don’t want to (or can’t yet) re-invent the wheel by becoming MapReduce experts,” Harris writes.
It’s not about just accessing the data stored on Hadoop, he adds. This generation of SQL-on-Hadoop tools allows users to query Hadoop data from inside Hadoop.
“The beauty of this approach is that data is usable in its existing form and, in theory, doesn’t require two separate data stores for analytic applications,” he writes.
Harris lists every solution he knows, providing an overview of how they work. He divides the available solutions into three groups:
For some reason, he doesn’t include the Hortonworks announcement, probably because it’s still in beta. The beta runs on Windows Server 2008 and Windows Server 2012, but not Windows desktop versions.
In other news:
New MinuteSort Record. MapR Technologies revealed this week that it set a new world record for MinuteSort, sorting 15 billion 100-byte records (a total of 1.5 trillion bytes) in 60 seconds. To achieve that, MapR ran its Hadoop distribution on 2,103 virtual instances in the Google Compute Engine.
Helping Executives “Get” Data Quality. Trillium Software is a rare bird these days — a pure-play data quality solution that offers industry-specific solutions for managing claims data, foreign tax accounting (FATCA), legal entity, regulatory capital optimization, Basel regulations, and other data quality-centric regulatory challenges. The latest release is designed to help govern data quality in both structured and unstructured data. Its also added new data visualization capabilities that the press release says will “enable business executives and managers to visually understand the impact of data quality on their business processes through reporting and business intelligence.” I think we can all agree that’s a good thing.
SnapLogic offers Big Data Service. Want to get data out of Hadoop quickly without custom code? SnapLogic recently announced it’s offering an integration-as-a-service for Hadoop, called Big Data as a Service. It runs on Amazon Elastic MapReduce and is certified by Cloudera. The integration will allow companies to move huge volumes of data into and out of Hadoop quickly, without custom code. SnapLogic’s service also addresses handling volume, security and ownership.