From Dream to Reality: A Single Enterprise Archive Solution for Big Data Retention

Ramon Chen
Last year, Gartner published a report titled, 'Enterprise Information Archiving Transforms the Strategy and Approach for Archiving." It forecast that enterprise information archiving (EIA) will become a key infrastructure component and will hold both structured data and unstructured content by 2013. Quite a bold prediction at the time considering that Gartner also published a 'Magic Quadrant (MQ) for Enterprise Information Archiving (EIA)' in 2010 as a direct replacement for its Email Active Archiving MQ in which the vendors listed did not offer any products that were designed for structured data archiving, let alone a single unified solution.

Archiving and retention of unstructured content (e.g., documents, files, emails, photos and videos) is a billion-dollar market, driven largely by increasingly strict compliance and regulatory requirements. Increasingly, content-addressable storage (CAS) devices that are immutable (i.e., will not permit editing information once it has been stored) are used to guarantee the integrity of unstructured data. Objects are easily located and additional policies can be set that govern the lifecycle (retention and expiry) of the document or object.

Meanwhile, there's no denying the growth of structured data retention needs (Gartner forecasts CAGR of 27 percent, making it the fastest-growing archiving category). Driven in part by the popularity of initiatives such as application retirement, whereby legacy applications are decommissioned, data is placed in a lower-cost tier of storage, but retained for on-demand access. CAS storage would appear to be ideal for retaining such structured data but for one big problem: Relational databases such as Oracle, Sybase, SQL Server and others do not run natively on CAS devices and you wouldn't want them to. After all, a big reason for decommissioning apps is to avoid the tax of paying maintenance and administering hundreds and thousands of databases that support transactional updates, when the new goal is to keep data immutable and guaranteed free of change.

In addition to enterprise application retirement, increasing volumes of machine-generated data (e.g., Telco Call Data Records, Automated Financial Trades, Smart Meter Sensor data), are fueling the market for cost-effective compliant storage. The twist on machine-generated data is the extreme volumes, often billions of records a day, requiring ingestion rates that relational databases cannot support. Enter the increasingly popular alternative of Hadoop and NoSQL databases that can handle Big Data. While more cost effective and scalable than traditional RDBMs, compliance and long-term retention are clearly not the focus.

Given the variety of structured and unstructured data types, ingestion rates, data volumes, the desire for immutability, a dream solution might be described as one that could:

1. Ingest data at extreme volumes (billions per day)

2. Reduce massive reduction in storage needs through compression optimized for each data type

3. Comply with regulations by enforcing immutability with full lifecycle retention and expiry management

4. Query objects or fine-grained structured data records without the need to re-inflate or restore data

All within a scalable, low-admin framework that results in the lowest possible TCO that you would expect of a single enterprise information archive.

The exciting thing is that this solution is a reality today with a new breed of online data retention repositories ready to deploy on CAS storage, and the great thing is that these capabilities are available in 2011, fully two years ahead of Gartner's 2013 forecast.

Add Comment      Leave a comment on this blog post
Jun 29, 2011 11:06 AM Peter Schwoerer Peter Schwoerer  says:
When considering an archiving solution particularly for structured data it is important to data access. A system such as Informatica�s archiving allows for �Live� archiving of data providing a seamless access between production and archive data. Additionally when utilized for application retirement Informatica provides up to 98% compression of data providing significant storage savings. Lastly with Informatica�s data services users are able to have a combined access of production data and retired applications. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.