In the age of Big Data technology, it’s tempting to ignore that last step in data’s life cycle: Deletion.https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=i
But while you may be able to—theoretically—keep everything, there are several business reasons why life cycle management matters more with Big Data.
1. Big Data grows ridiculously fast. There are plenty of statistical examples that demonstrate how fast unstructured and semi-structured data grows. Each day, we create 2.5 exabytes of data, according to Maria C. Villar, managing partner of Business Data Leadership. “A single jet engine can generate 10TB of data in 30 minutes,” she adds.
Smart meters and heavy industrial equipment generate equivalent amounts, she adds. All that sensor and web chatter adds up too quickly.
“It is critically important to develop a life cycle management strategy that includes archiving and deletion policies, business rules and IT automation,” Villar warns in an Information Management article. “The company will not be able to store all this incoming data forever.”
2. Most Big Data is ephemeral by nature. Not only does it grow like Kudzu on Miracle-Gro, but much of it is quickly too dated to be useful.
Sensor data, social media mining and even web logs can be analyzed for trends, but when you’re using it for real-time decisions--and companies often are--this data quickly becomes stale.
3. Out-of-date Big Data can undermine the results of your business analytics. One reason there’s a huge focus right now on streaming data and real-time analytics is that companies are trying to manage by exception. Organizations don’t care when things go right--but they very much care when a wind turbine is about to fail. This kind of real-time monitoring plays a major role in many Big Data use cases.
“With Big Data, which can be unpredictable and come in many different sizes and formats, the process isn't so easy,” writes Mary Shacklett, president of technology research and market development firm Transworld Data. “Yet if we don't start thinking about how we are going to manage this incoming mass of unstructured and semi-structured data in our data centers--where images, videos and documents are growing at a clip of 80 percent--we may never be able to lift our heads from under it!”
In a recent TechRepublic article, Shacklett offers two steps to help IT manage all this Big Data.
“First, that some old fashioned data management meetings--this time about big data--should be held at both the strategic and operational levels,” she writes. “Second, if IT hasn't already done so, it should get aggressive in the data center, putting into play technologies that are proven and ready to harness the Big Data that daily enters corporate portals.”
Timeliness dimensions should also be part of any Big Data discussions, Villar writes. She suggests you incorporate three time criteria into your data quality process and metrics:
- How often do we need to get this data into the internal systems? Every 15 minutes? Hourly? Daily? Is that happening?
- How frequently does this data need to be refreshed to be useful? Is that happening?
- When is data too dated to provide value to the business? Make sure you establish an ongoing archiving maintenance program that will remove the data as it becomes dated.