Today the term "Big Data," as it applies to structured data, is mostly synonymous with high-performance analytics running against extreme volumes of data. Its popularity for business insight comes as no surprise, given the continuous data deluge reported by IDC and featured in The Economist, and the desire to achieve price/performance efficiencies comparable to Big Data pioneers such as Google and Facebook. Make no mistake, Big Data analytics is big business, with vendors such as Aster Data, Greenplum (recently acquired by EMC), Cloudera, Vertica and others offering products and services in a crowded space dominated by heavyweights such as Teradata, Netezza, IBM, Oracle and others.
While analytics are important and beneficial to a business, data retention represents a fast-growing priority driven by industry-mandated compliance and business imperatives. Examples include the much publicized Health Information Technology for Economic and Clinical Health (HITECH) act, which provides grants and incentives for those organizations that comply with maintaining and using electronic health care records (EHR) records for treatment, and stiff fines for those that cannot comply by designated deadlines.
From a business or best practices point of view, increased access to historical health care data can also help provide a more accurate diagnosis and even save a life. In financial services, new rules for financial regulation, litigation, high-profile government investigations, in combination with well-known acts such as Sarbanes-Oxley (SOX), are driving the need for extended retention and on-demand access to historical data. In this case, the possibility of large fines and even jail time makes data retention a priority at the highest levels of management.
Companies in many other industries face similar imperatives, so when push comes to shove, you can analyze for show, but you need to comply for dough.
Even as the cost of storage continues to fall, increasing data volumes combined with extended retention periods and on-demand access is placing pressure on Online Transaction Processing (OLTP) and Online Analytics Processing (OLAP) systems and repositories. In other words, you have a Big Data retention problem when your retention needs require you to add resources and capacity at an alarming rate.
Since OLTP and OLAP systems are optimized for high-performance transactional and analytics, respectively, they have a corresponding high operational cost point to match. Continuing to hold historical, less frequently accessed data significantly degrades the performance of these production-critical systems, as well as magnifying the cost of specialized storage, servers and people skills that are required for ongoing operations.
Even without this new wave of Big Data, an average production system has been found to be bloated with over 90 percent static or historical data. Many traditional and non-traditional options are available including gaining a better understanding of the data and storage usage, tiering by relocating less frequently accessed data to lower-cost physical storage, and even good old fashioned offline tape archiving. However, stringent compliance requirements, coupled with granular record-level configurable retention and expiry needs, have given rise to an entirely new category of repository for Online Data Retention (OLDR).