Will a Data Warehouse Appliance Work for You?

Ann All

Ann All spoke with Philip Howard, a research director specializing in data management and development, Bloor Research.


All: Can data warehouse appliances help companies solve some of their data warehouse issues?
Howard: Yes, they can, but not in all situations. There are two good reasons for buying an appliance. One is to save money, and the other is because you can't get the desired performance out of your existing system. The money one is sort of a no-brainer, if you'd like. The price to buy [the appliance] is less, but ongoing maintenance costs are less as well. Appliances don't use indexes so you require much less database administration. Also, a number of customers don't pre-calculate aggregates at all any more, because they can do it faster with the appliance. That's a big savings on administrative overheads.

In terms of the performance, there are a couple of areas in which appliances can be helpful. One is where you have complex queries with lots of joins, a complex query that a conventional warehouse will take a long time to process. Appliances are much faster. Queries that used to take hours can be done in a few minutes.

DATAllegro has a site at Sears. Sears uses [the appliance] as a front end to their Teradata warehouse to calculate aggregates. So when they want to do slice-and-dice, how many we sold in which stores and of what color, they use the appliance. They bought it because it's much faster to calculate those aggregates using the data warehouse appliance than to put it into the Teradata warehouse.


All: What are some of the limitations of data warehouse appliances?
Howard: Lots of data warehouse vendors are talking about processing real-time information and taking action right away. That is not a strength of appliances at the moment. They have very fast loading and processing, but they are not really geared up for real-time information. If you want to combine text analysis with data analysis, they also don't do that. They work well for queries that involve just conventional data. If you were in a hospital and you wanted to calculate morbidity rates and you wanted to bring in doctors' notes along with the hard data, appliances wouldn't have those sorts of capabilities.

When people talk about scalability, they usually mean terabytes of data that can be supported, and appliances are pretty good on that front. But people tend to forget about scalability in terms of users. Increasingly, people are looking to have not just complex queries and data mining, but they also want to be able to have lots of short-running queries from potentially lots of users. To do that requires the ability to support lots of users. Currently, the appliance vendors would be talking about a few hundred users at most, whereas IBM, for example, might talk about thousands.

The products that are currently available are also somewhat limited in their ability to schedule and prioritize queries and so forth. Netezza has built some of those facilities, but it's not as comprehensive as, say, in DB2. DATAllegro also has some facilities, but not as much as Netezza. So there's a sort of ramp-up. All the [appliance] vendors are moving in that direction, however.


All: Can you give us a better idea of some of the other features that data appliance vendors plan to offer in the near term?
Howard: In general, they are looking to offer the ability to manage a mixed-query workload. Let's say you have a few professors off in a lab doing some data mining, you've got some business analysts doing complex queries, and you've got a bunch of people doing very simple, short stuff. If a data mining query takes 70 minutes instead of 60 minutes, that probably doesn't matter too much. But if the short stuff takes five seconds instead of one second, people really start to notice. So you need to be able to prioritize your queries.

Appliances open up possibilities for companies that never considered data warehousing before, so there are a number of new vendors about to join the market. Vertica, Michael Stonebraker's new company, will launch shortly. There are more appliance vendors coming.


All: Do you think some of the big data warehouse vendors will add appliances to their product lines?
Howard: HP already introduced a product called Neoview, but they've been very, very quiet about it. It's based on Tandem NonStop technology. IBM packaged up a hardware solution with DB2, but it's not really an appliance the way that DATAllegro or Netezza is. It is pre-tuned and comes packaged with the hardware, so it's simpler to get up and running. They've got some low entry price points. So they've responded to the people like Netezza. The problem for IBM and Oracle is, they can't really turn around and say, "Well, Oracle isn't good enough for this." You can't imagine Larry Ellison saying that, can you?

I think [appliances] could be a disruptive technology for Teradata. Appliance vendors have come in and said, "We can do the things that the low-end Teradata boxes can do." They've been winning lots of that business. It's easy for Teradata to say, "That's low-end, low-margin stuff for us." The problem is [appliance vendors] have more customers now, they start to build new capabilities into the products, they start creeping up into Teradata's market. Teradata gets squeezed because how much market is left?

Add Comment      Leave a comment on this blog post

Sep 10, 2009 7:32 AM Warehouse optimizing Warehouse optimizing  says:

I just have a question in my mind that, why a complex query which contain conventional warehouse takes more time to process?


Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making


SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data