Big Data and Integration: Old Solutions for a New Market?

Loraine Lawson
Slide Show

The Business Impact of Big Data

Many business executives want more information than ever, even though they're already drowning in it.

When you hear about Big Data, the focus is usually on the Internet and how dotcom big wigs like Google and Twitter use it. But the Internet didn't create Big Data, and it didn't come into usage just because of Hadoop.

 

In fact, financial services companies have been dealing with large data loads for a long time; although, as we all know by now, the effectiveness of their data dealings hasn't always been stellar.

 

A recent InformationWeek article, written by Editor-at-Large Doug Henschen, notes that online marketing data company comScore has dealt with Big Data for a decade now. Henschen also points out a much under-discussed fact of Big Data: It's not all about the storage.

 


The article focuses on what comScore - and one assumes, other companies - do to manage huge data before they load it into storage. As you may have guessed from the fact that I am writing about it, those pre-storage steps involve running the data through integration software to sort it and organize it:

... comScore collects about 2 billion new rows of panel data and more than 18 billion new rows of census data each day. That means more than 20 million rows of new data is loaded into the data warehouse each day. Of course, most every organization will apply compression to reduce storage demands. But comScore also uses Syncsort DMExpress data integration software to sort and bring alphanumeric order to the data before it's loaded into the warehouse. This improves compression ratios.

In essence, the integration step reduces 10 bytes of data to three or four bytes, the article notes, using Syncsort's DM Express.

 

If you don't know about Syncsort's data software, you're not alone - even though it's been in business more than 40 years. Its data processing and integration software hails from the old days of mainframes, and it's designed to deal with large amounts of data. As of last year, when I interviewed Syncsort's technology strategist Joe Lichtenberg, its client list included 90 of the Fortune 100 companies. As you might expect, it hasn't exactly been a product for the budget-minded, smaller organization. But the company is finding a new market as more companies become involved with Big Data and it's interesting to see how this mainframe company is applying its technology today.

 

Recently, I received an email from a press person informing me that Syncsort had a new executive team and, apparently, a new approach to how it markets its product, since the email noted that "Syncsort advocates a new best practice, data integration acceleration."

 

As the InformationWeek article explains, using a data integration acceleration is one way to speed up dealing with Big Data loads and to reduce some of the complexities of ETL loads:

Not every company operates at comScore's scale, but the lesson is that not every big-data challenge is best left to the high-powered database platform to solve. Sorting, filtering, aggregation, and transformation steps can streamline data before it gets to the data warehouse, saving CPU cycles and storage space before and after the crucial data-loading stage.

I think it's great that we're seeing some press about how integration and processing of data tie in with Big Data beyond the basic database/storage issues. I hope to hear more about it, obviously.

 

But I suspect it could also become confusing as we see more solutions and marketing messages trying to grab a piece of that growing Big Data pie. For instance, after reading the article, what I'm still not sure about is whether this concept of data integration acceleration is unique to Syncsort and possibly more of a marketing term or whether there are other solutions in this space. I can tell you a quick Google search shows almost all of the results reference back to Syncsort, which in the past has meant this is more of a marketing term. Syncsort's email explained the process this way:

The Syncsort approach rests on four basic tenets: High performance at scale, minimum resource utilization, ease of use with no tuning required and the ability to integrate with other data integration platforms. The result is simpler maintenance with all transformations happening in one place, faster responses to new demands for information, and greater flexibility to adapt to changing conditions. Ultimately, it is about utilizing data faster and more efficiently than the competition.

When I first read this and the InformationWeek article, I wondered if MapReduce was an open source tool for this same function. Apparently not. IT Business Edge's Mike Vizard wrote this week that Syncsort is offering a DMExpress Hadoop Edition to accelerate Hadoop's processing of MapReduce and hide some of its complexity.

 

I plan on catching up with Syncsort soon to find out more, but my big question is about cost. One reason that Big Data is becoming more accessible to more organizations is that Hadoop makes it very affordable to store and process large amounts of data.

 

I suspect this won't be the last time I'll be confused about Big Data offerings. What's going to be tricky is separating what you need from what you don't - and whether or not there's an existing open source product that fills the same need. It'll be interesting to see, then, how add-ons such as DMExpress Hadoop will do in the marketplace.



Add Comment      Leave a comment on this blog post
May 12, 2011 2:42 AM Brian Hopkins Brian Hopkins  says:

The term "Big Data" is a bit of a misnomer, so its causing a lot of confusion as vendors ramp up the hype. We are thinking about it in terms of not only big volume, but high velocity, variety and variability.

Some of the most interesting uses of technologies such as Hadoop are coming from the velocity and variability characteristics of "data at an extreme scale" - which is perhaps a better thing to think of when you hear the words "Big Data".

What we are seeing is that it's not about just handling large amounts of complex data - agree, we have been doing that for years, as you point out. It's more about handling it in ways that are faster, cheaper and more foward looking that our current technology allows. Hadoop is just one example, and perhaps the most over-used and misunderstood.

I'm blogging and tweeting quite a bit on this topic leading up to my discussion on it at Forrester's IT Forum in May. If you're coming, please let me know or reach out via my blog or tweet me for more dialog. This is a very interesting, important and misunderstood emerging technology.

Brian Hopkins

Principal Analyst Serving Enterprise Architecture Professionals

@practicingea

Reply
May 12, 2011 2:45 AM Brian Hopkins Brian Hopkins  says: in response to Brian Hopkins

BTW...here's my last blog post; it touches on Big Data's impact to our notion of a Data Warehouse. Have one post coming this week that goes into more detail.

http://blogs.forrester.com/brian_hopkins/11-05-05-not_your_grandfathers_data_warehouse

Reply
Aug 3, 2011 3:19 AM Asigurari Sanatate Asigurari Sanatate  says:

I think these solution, even they are old, are the best options of the moment. If the economy will increase, the people will be more able to obtain money from the banks, for example.

Reply
Sep 9, 2011 9:44 AM asigurari locuinte asigurari locuinte  says:

It's always tricky to rely on Big Data. If you don't know how to go about doing it, you'll end up with skewed results which in turn will cause you to make bad decisions. So you'd better pay attention

Reply
Feb 17, 2014 8:13 AM Chirieac Bogdan Chirieac Bogdan  says:
I agree with you. It's not all about the storage. Reply
May 5, 2014 1:00 PM detoxshop detoxshop  says:
Big Data is always the best option! Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.