While I focus on integration and data management issues, IT Business Edge’s Arthur Cole writes about infrastructure, including the actual databases. But sometimes, it’s not possible to separate the hardware from the data — especially when you start talking about Big Data.
Two recent articles show how storage can affect business goals and your integration strategy.
First, there’s this recent ReadWrite Enterprise article, which explains how Hadoop breaks down silos by changing the cost of managing data. Without Hadoop, back in 2008, your costs for managing 168 TB of data for three years would’ve been about $2.33-$2.62 million. With Hadoop, you can double that storage amount and your costs over three years would be around $1.05 million, according to Doug Cutting, chief architect at Cloudera and Hadoop creator.
Another bonus about Hadoop: The hardware itself can be distributed and hold any type of data, yet Hadoop’s software will manage it. That’s why it scales well, and why it can be used to break down information silos in the enterprise.
The article doesn’t go into archiving, but that’s turning out to be another big use case for Hadoop. Instead of archiving to tape, more companies are archiving to Hadoop clusters, which means that older data is readily available for running more in-depth analysis.
Okay — you probably knew that. But the other piece of the puzzle is how that affects the business. In the past, you had to build for specific business assumptions because there was simply no easy or affordable way to retain all of your data. You had to prioritize.
Big Data technologies change that: It’s easier and cheaper to store all of your data, which means it’s okay if business priorities change or just expand.
Now, having said that, an interesting counterpoint is an Inside Analysis piece about in-memory analytics and Teradata’s new Intelligent Memory solution. It goes against the Big Data grain to a certain extent by prioritizing data based on its use.
It doesn’t get rid of the data, mind you — and in that way, it’s very different from the approaches IT had to rely on in the past. Instead, it’s built on the assumption that not all data is created equal, so only certain sets of data need to be accessed on a regular basis. As it turns out, only about 20 percent of a database’s data will be used in most of the query activity.
Therefore, the rest of the data doesn’t need to sit in-memory.
What’s really interesting is that Teradata’s Intelligent Memory uses an algorithm to determine what’s hot and what’s not, as opposed to requiring manual intervention. “Using patented algorithms, Teradata tracks the temperature of all the data on its servers and ranks it, all the while adjusting to new query patterns as they arise,” Inside Analysis explains. “Teradata’s Intelligent Memory simply acts as a new resource layer to be considered along with normal memory, SSD and disk. It’s AI.”
In other words, it does create silos — but not in any structurally permanent way, which is a huge advantage over past approaches that just added more data silos.