Cloudera, which offers a distribution of Hadoop, announced this week a search for data stored in Hadoop Distributed File System and Apache HBase. The company says it’s the “industry’s first fully integrated search engine for interactive exploration” of data in these systems.
The search engine allows you to use natural language keyword searches, which means the untrained masses will be able to poke around to see what’s in the data.
Cloudera is also offering a related Real-Time Search subscription, which will give you access to technical support and legal indemnification, as well as the ability to provide feedback to the open source project.
The CIOL article explains that integrated search is yet another way Hadoop differs from traditional databases, which attempted integrated search before surrendering to independent search products.
“Hadoop's flexibility makes it well suited for search, and consequently, a better general-purpose platform for data exploration than relational databases,” the article notes.
Big Data News
There were a ton of Big Data announcements this month, largely due to this week’s Big Data Innovation Summit in Boston. Here are a few more key reveals:
Syncsort announced a new technology partnership with Tableau Software. Big Data integration company Syncsort is partnering with analytics company Tableau Software to make dealing with Big Data easier for their customers. Syncsort will bring to the collaboration its ETL platform, which will allow you to collect, process and integrate data into Hadoop faster. Tableau offers a drag-and-drop interface for analyzing the data, including visualization and sharing tools.
As far as I can tell, there’s no specific offering, just the partnership and a goal to make it easier to use Hadoop. This Silicon Angle write-up does the best job of explaining the silo-busting each company hopes to achieve with this deal.
Marketing Data Quality and Management Tool Avaiable for Hadoop. RedPoint Global, a niche data management solution for marketing and one of Gartner’s “cool vendors” for 2013, announced its data quality tool—called simply Data Management—is now available “on” the Hadoop 2.0 platform.
As I understand it, this means you will be able to use Hadoop’s unstructured data form, social media sites and the like for identity resolution, master data management, data parsing, aggregations and automations. In other words, it makes the data in Hadoop play well with your legacy relational data base systems.