Archiving and retention of unstructured content (e.g., documents, files, emails, photos and videos) is a billion-dollar market, driven largely by increasingly strict compliance and regulatory requirements. Increasingly, content-addressable storage (CAS) devices that are immutable (i.e., will not permit editing information once it has been stored) are used to guarantee the integrity of unstructured data. Objects are easily located and additional policies can be set that govern the lifecycle (retention and expiry) of the document or object.
Meanwhile, there's no denying the growth of structured data retention needs (Gartner forecasts CAGR of 27 percent, making it the fastest-growing archiving category). Driven in part by the popularity of initiatives such as application retirement, whereby legacy applications are decommissioned, data is placed in a lower-cost tier of storage, but retained for on-demand access. CAS storage would appear to be ideal for retaining such structured data but for one big problem: Relational databases such as Oracle, Sybase, SQL Server and others do not run natively on CAS devices and you wouldn't want them to. After all, a big reason for decommissioning apps is to avoid the tax of paying maintenance and administering hundreds and thousands of databases that support transactional updates, when the new goal is to keep data immutable and guaranteed free of change.
In addition to enterprise application retirement, increasing volumes of machine-generated data (e.g., Telco Call Data Records, Automated Financial Trades, Smart Meter Sensor data), are fueling the market for cost-effective compliant storage. The twist on machine-generated data is the extreme volumes, often billions of records a day, requiring ingestion rates that relational databases cannot support. Enter the increasingly popular alternative of Hadoop and NoSQL databases that can handle Big Data. While more cost effective and scalable than traditional RDBMs, compliance and long-term retention are clearly not the focus.
Given the variety of structured and unstructured data types, ingestion rates, data volumes, the desire for immutability, a dream solution might be described as one that could:
1. Ingest data at extreme volumes (billions per day)
2. Reduce massive reduction in storage needs through compression optimized for each data type
3. Comply with regulations by enforcing immutability with full lifecycle retention and expiry management
4. Query objects or fine-grained structured data records without the need to re-inflate or restore data
The exciting thing is that this solution is a reality today with a new breed of online data retention repositories ready to deploy on CAS storage, and the great thing is that these capabilities are available in 2011, fully two years ahead of Gartner's 2013 forecast.