Discerning Analytics in Forests of Information

Michael Vizard

Sometimes what we think can be accomplished using any given technology is limited only by our imagination. Such appears to be case with SQL, as new database architectures that allow us to extend the capabilities of SQL to new classes of analytical applications using parallel database architectures become increasingly apparent.

ParAccel, a startup led by Barry Zane, a technical co-founder of Netezza, and Rick Glick, a former chief technology officer of Teradata, is trying to pioneer the development of a massively parallel analytic database that supports standard SQL applications.

By supporting SQL, as opposed to requiring IT organizations to master new data models such as MapReduce that are limited in comparitive analytics capability, ParAccel expects to give customers the ability to analyze massive amounts of data without having to change a familiar programming construct. In fact, Zane and Glick argue that it is the current architecture of relational databases that holds SQL back in terms of being used in data-intensive analytic applications, rather than any particular limitation of SQL.

By employing a massively parallel database architecture, IT organizations will be able to invoke the full potential of multicore processors in a way that extends their current application environments. That approach will then make it easier to bridge the 'information gap' that prevents IT organizations from doing more exploratory analysis against large quantities of data, versus having to confine their queries to sets of data set that are limited in size by the capabilities of the underlying relational database. The major benefit of a massively parallel database is an expanded ability to correlate larger sets of data that includes much more granular data.

The ParAccel Analytic Database, in contrast, can easily load 9TB of data in about an hour, according to both Zane and Glick, without coming anywhere near the theoretical limitations of the database architecture.

Both Zane and Glick are solicitous to their former employers, but note that earlier generations of SQL database architectures are limited in their ability to take advantage of parallel processing, especially compared to ParAccel database where none of the system resources have to be shared.

It remains to be seen how traditional database vendors such as Oracle, IBM, Sybase and Microsoft will respond to the challenge of companies such as ParAccel. At present, the current approach to analytics using existing technology is roughly equivalent to analyzing a stand of trees and then extrapolating what the forest looks like. It's a whole other thing to analyze not only a stand of trees, but every tree in multiple sets of forests. As the sheer volume of data continues to grow, it's pretty clear that existing architectures are going to crack under that kind of strain. So what we need to start thinking about is new way to finally compare all the forests without having to make as many guesses about what kind of trees are actually in them.

Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.