It is widely recognized that Big Data must also be Fast Data if it is to provide any real value to the enterprise. But many organizations are just now starting to realize what a significant challenge this will be. After all, storage capacity is always available somewhere to absorb data loads. But getting to that data, searching it, analyzing it and producing actionable results is another matter, and it is all more difficult when you consider the ephemeral nature of much of that data and the need for the entire system to function on a real-time or near-real-time basis.
From an infrastructure perspective, the biggest danger is following “cookie-cutter” best practices for Big Data, says IT analyst Wayne Kernochan. Capacity and speed often work at cross purposes, so satisfying both within a common architecture will require a significant amount of fine-tuning. Big Data, for instance, places a premium on in-house Hadoop support, cloud-enabled software and massive storage capacity. Fast data is all about handling reams of sensor-driven traffic, so it requires rapid database updating and initial analytics capability that can best be supported by NVRAM and SSD storage. Combining the two, therefore, will require on-disk separation of Big Data and Fast Data, as well as common access to Fast Data stores by Big Data databases and analytics tools.
Striving for Big Data without addressing Fast Data is like buying a house that you can’t afford or cannot maintain, says iCrunchData News’ Emma Saunders. The only way a dataset becomes big, in fact, is if underlying systems and resources can’t cope with it, so if your normal infrastructure routinely handles a million lines of semi-static data, there is no need to upgrade to a Big Data footing. But if it folds under 100,000 lines of rapidly changing, highly time-sensitive workloads, then you’ll need to deploy some advanced data handling capabilities—again, not because the loads are too large but because they need to be addressed in a much more rapid fashion than standard enterprise data.
How fast is fast? Faster than mere humans can comprehend, certainly, says Adobe’s Matt Asay. Once we enter the realm of machine learning and real-time analysis and processing, human intervention ceases to be a design objective. Instead, you’ll have NoSQL databases like Cassandra and MongoDB responding to queries immediately and then pushing the clickstream directly into the Hadoop engine for deeper analysis. This, in turn, will generate a return feed back to the database for further action. As even recent developments like MapReduce give way to real-time Spark functionality, the connection between analytics and transactional processing grows even tighter, so intervention at the glacial pace of the human mind will serve only to diminish the value of data and the investment into real-time, Big Data infrastructure.
The key test for all of this emerging technology will come over the next three months, and indeed during the holiday season for the remainder of the decade, says Database Journal’s Lockwood Lyon. With data loads rising some 40 percent during this time, both the strengths and the weaknesses of Big Data infrastructure will become evident, which should guide development for the next year. Already, the fruits of this cycle are ripening as today’s storage and retrieval capabilities from operational systems are driving the need to shore up the analytics side of the house. The immediate task is to ascertain the degree that this level of analytics will drive scale in raw storage and processing without adding latency to the overall environment.
The speed factor in Big Data is not merely about convenience. To be sure, nobody likes to wait while applications are loaded or queries are run, but the need for real-time performance is driven more by the desire to capitalize on opportunities than simple user functionality. When sales opportunities or traffic alleviation can be addressed within a brief window of time, the system must maintain a high state of readiness and performance in order to provide truly worthwhile service.
The challenge going forward will be to ensure that Big Data can get bigger without getting slower as well.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.