OK, you're not Twitter and you're not going to be storing any tweets, much less a trillion. But big sets of data are a growing concern for organizations across industries, particularly as we get more data from online use and sensors.
So, it's fascinating to see how cutting-edge companies such as Twitter and Google are tackling the problem. And this fascinating article offers a readable interpretation of the challenge and how these companies are solving it.
This jump in storage can cost big money, and companies want to minimize that cost.So, Twitter is opting for a new and little-known format called Protocol Buffers, developed by-no surprise here-Google. The really cool thing about this approach is "it can automate the process of recreating the data structures within applications," according to the article. And the reason that that's cool is you can structure the data once and then easily generate the source code for using it in different programs. What's more, you can update the data structure and not break all the programs that are using the old format, which, I think your development team will agree, is super great.
The article also talks about Hadoop, which is an open source approach for storing and processing huge amounts of data. You can learn more about Hadoop's business uses by reading Mike Vizard's post, "The Need for Speed with Big Data,"and my recent interview with Doug Cutting, which ran in two parts: "Creator of Hadoop Explains Why It's More than Just Storage" and"How Companies Are Using Hadoop."