The Hadoop Summit happened this week in San Jose, and with 90 sessions over two days, I’m sure we’ll be hearing more about it over the next few weeks.
The big unveiling, of course, was YARN, which stands for Yet Another Resource Negotiator. It re-architects Hadoop so it can process more than MapReduce, according to reports.
Arun Murthy, founder, Hortonworks, described it as allowing “applications to run ‘in’ Hadoop, instead of ‘on’ Hadoop.” Hadoop experts see this as a key development in broadening how Hadoop is used, allowing it to run applications and processes besides MapReduce.
Beyond YARN, one of the more interesting items I’ve read about is a Hive Query Tool developed by TripAdvisor.
What makes this tool special is that it’s designed to give business users a way to query Hadoop clusters using Hive, while bypassing all the Hive and SQL headaches, according to TripAdvisor senior software developer Stephen Scaffidi. It’s called, simply enough, the “Hive Query Tool” and Scaffidi has made the tool available through the Apache license.
GigaOm writer and journalist Jordan Novet included a write-up about the new tool as part of his conference coverage. He also covered NASA’s presentation on using a Hadoop cluster to analyze climate and atmosphere data.
Scaffidi’s team wasn’t impressed with existing tools, so they just created their own. His slide presentation points out that he had no Hadoop experience prior to joining TripAdvisor, but when he started to tinker with it, he found Hive Thrift Server “horrible,” to quote his slide presentation.
Very basically, Scaffidi notes that the Hive Query Tool runs HQL code through the standard Hive CLI. In the presentation slides, he notes that they “didn’t build anything new, just used the existing Text:Templates module in a clever way.”