This week, SAS added Hadoop support to the update of its SAS Enterprise Data Integration Server, making it the latest to add what is essentially a connector to Hadoop data stores.
This news has fallen short of the Hadoop end game outlined by SAS earlier this year, when it unveiled plans to create an in-memory business intelligence platform that would use the Hadoop Distributed File System.
But it does at least provide an in-road into Hadoop's data stores, adding it to the list of other heavy hitters, such as MySQL, Oracle, DB2, SQL Server, Teradata including Teradata Aster, Sybase, Netezza and EMC Greenplum.
"IDC expects commercial use of Hadoop to accelerate as more established enterprise software providers such as SAS make Hadoop accessible and easy to use," Carl Olofson, IDC research vice president for application development and deployment, is quoted as saying.
So far, this is largely what we've seen with Big Data: Connectors that let you tap into the Hadoop files for more traditional analytics. It's basically pulling out data and adding it into existing tools, thus giving customers a way to dip their toes into Big Data.
But is this really enough? Is Big Data really that simple: Connect to a data store and boom, you're in?
Not really, according to Amazon CTO Werner Vogels, who recently shared his opinion on the matter during a keynote at the Cebit trade show.
To make full use of the growing amounts of data many enterprises collect and to gain a competitive advantage, innovation has to occur in all of these areas, not just analytics, according to Vogels.
"Big data is not only about analytics, it's about the whole pipeline. So when you think about big data solutions, you have to think about all the different steps: collect, store, organize, analyze, and share," Vogels is quoted as saying in a recent InfoWorld article. "It is really important that if you go into this big data world that you have limitless possibilities in your hand. You should not be restricted in the way you store things or the way you process it."
True innovation from Big Data will rely on that entire chain, he added.
That's easy for Amazon, with its Web Service and Amazon EC2 offerings, to say - and, it must be noted, sell. But what about those of you who don't have a massive cloud solution sitting around?
Believe me, Amazon, SAS and others haven't forgotten you and major vendors are still working out how best to bring Big Data to more organizations. Amazon, for instance, offers several Big Data options via its cloud. SAS's big plan for Hadoop is to offer an in-memory BI solution powered by Hadoop - similar to what SAP's HANA and Oracle's Exalytics promise.
Even data experts are still exploring what it means. R "Ray" Wang of Constellation Research, for example, recently offered an expansion of the accepted definition of Big Data as "Volume, Velocity and Variety." He suggested it include:
Virality - how quickly information is dispersed and shared across P2P nodes, and
Viscosity - which measures the resistance to flow in the volume of data, including slowdowns due to "friction from integration flow rates."
There is one thing you can take away from Vogels' talk: When you do start to dig around in Big Data, you'll need to make sure you're going "big enough." Amazon has found that when it's made mistakes with its Big Data analytics, it's because there hasn't been enough data to back up a recommendation.
Big Data is still relatively new, and the market is still exploring how to use it, so try to hold your cynicism in check.