SHARE
Facebook X Pinterest WhatsApp

Apache Hive Initiative Improves SQL Interface

Eight Ways to Put Hadoop to Work in Any IT Department Hadoop is big, but there’s no doubt that the game changer will be marrying SQL— the primary language used by business analysts for ad hoc analysis—with Hadoop. If you don’t want the information in Hadoop to become an inaccessible silo, then you have to […]

Written By
thumbnail
Loraine Lawson
Loraine Lawson
Oct 24, 2013
Slide Show

Eight Ways to Put Hadoop to Work in Any IT Department

Hadoop is big, but there’s no doubt that the game changer will be marrying SQL— the primary language used by business analysts for ad hoc analysis—with Hadoop.

If you don’t want the information in Hadoop to become an inaccessible silo, then you have to address the SQL problem, as a recent Silicon Angle article, “Hadapt Furthers SQL-NoSQL-Hadoop Integration,” points out:

‘The promise of Hadoop is that the scalable, cost-efficient framework means companies can now afford to store all their data and break down the barriers between existing databases. But in reality at many early adopters Hadoop turns into yet another data silo, limiting Hadoop’s value.’

The problem is, SQL needs data in a traditional table format of rows and columns, but this is not how Hadoop organizes data.

Running a SQL query on Hadoop data stores consumes all the cluster resources, triggering performance issues for other applications and jobs running in the cluster, Hortonworks founder and architect Arun Murphy told IDG.

Instead, processes are run with MapReduce, a framework that few already know and that’s difficult to learn.

If you want data stored in Hadoop to be useful, then you have to address the SQL problem. Vendors know this, and since Hadoop first emerged on the enterprise scene, there have been a number of announcements and solutions all targeting that sweet spot, as Ars Technica reported in its coverage of this year’s Hadoop Summit.

“It looks as if there’s not a Hadoop-related vendor here who isn’t promoting an SQL solution,” Hadoop Summit speaker and Chief Scientist at Mesosphere Paco Nathan told Ars Technica. “And a few of them sound too good to be true.”

In February, Hortonworks and the Apache community announced the Stinger Initiative, a three-phase project targeted at improving the SQL interface with Hadoop.

Since Hive is the de facto SQL engine for Hadoop, the group focused primarily on Hive, but also on more core Hadoop issues such as boosting performance with YARN and, eventually, relying on a latency-reducing framework for complex data tasks called Tez, Jaxenter reports.

The goals are not small and include improving Hive query performance by 100 percent to allow for interactive query times of seconds. They also want SQL queries to scale from terabytes to petabytes and for Hive to support a broad range of SQL semantics for analytics applications running against Hadoop.

The project was set up in three phases, two of which are already completed. That means the initiative finished 70 percent of its work in just six months.

Shaun Connolly, Hortonworks’ vice president of corporate strategy, said Stinger is already paying off. A few years ago, a Hive 0.10 query might take 1500 seconds simply because it’s batch-oriented, he explained. In the latest release, Hive 0.12, the same query runs in 10 to 12 seconds.

“So you’re seeing 60, 70, 100 times performance increase just because of this Stinger initiative works at making apache Hive, the de facto SQL interface, faster as well as more SQL compliant,” Connolly said. “Why that is important is you have this classic BI infrastructure of Tableau, MicroStrategy and others—even Excel that want to interact with data in Hadoop, leveraging SQL.”

So far, the Stinger Initiative has delivered base (phase I) and advanced (phase II) optimizations, performance boosts by leveraging YARN, and support for:

  • SQL analytic functions
  • SQL Types
  • ORCFile Modern File Format

Phase II was delivered in September and included in Hortonworks Data Platform 2.0 GA, which was released yesterday.

The last phase of Stinger will include:

  • Hive on Apache Tez
  • Always on query service
  • Buffer cache
  • Cost-based optimizer (Optiq)

Hortonworks offers a three-step process, including tutorials, to help SQL users learn about Hive.

Recommended for you...

How Revolutionary Are Meta’s AI Efforts?
Kashyap Vyas
Aug 8, 2022
Data Lake Strategy Options: From Self-Service to Full-Service
Chad Kime
Aug 8, 2022
What’s New With Google Vertex AI?
Kashyap Vyas
Jul 26, 2022
Data Lake vs. Data Warehouse: What’s the Difference?
Aminu Abdullahi
Jul 25, 2022
IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.