If you think you know what Big Data is going to be like based on the volume of today’s workflows, well, to coin a phrase, “you ain’t seen nothin’ yet.”
The fact is that with the sensor-driven traffic of the Internet of Things barely under way, the full data load that will eventually hit the enterprise will be multiple orders of magnitude larger than it is today, and much of it will be unstructured and highly ephemeral in nature, meaning it will have to be analyzed and acted upon quickly or it loses all value.
The good news is that much of the processing will be done at the edge, where it can be leveraged for maximum benefit without flooding centralized resources. But a significant portion will still make it to the data center or the data lake, which means the enterprise will need to implement significant upgrades to infrastructure throughout the distributed data environment, and soon.
The main challenge in building Big Data infrastructure will be finding a way to support the new class of analytics that comes into play, says Tom Krazit, executive editor of the annual Structure Data conference. In the old days, the process involved identifying a problem first, then collecting the necessary data and running the analytics. With Big Data, the sheer number of events will break this system, and even if you could produce actionable intelligence on the back end, it will most likely come too late to be of any use. In the new world, machine learning and other data handling techniques will be working constantly to prioritize and contextualize data, while end-to-end resource orchestration platforms strive to balance loads, match requirements with capabilities, and otherwise prevent data loads from crashing systems and blowing budgets.
Identifying the need for this kind of advanced infrastructure is one thing, implementing it is quite another. Two key challenges are incorporating machine learning into production workloads and converting idle backup assets into active data copies, according to analytics developer Talena. The company’s new ActiveRx platform seeks to address both problems through a mix of prescriptive platform analytics and active copy analytics. The idea is to use machine learning algorithms to streamline infrastructure usage for functions like backup job processing and service level monitoring, while at the same time producing rich visualization of backup data in Cassandra, Hive, Spark or even customized engines. In this way, organizations gain an integrated analytics environment with minimum data generation and resource consumption.
Another technique that is likely to come into play as Big Data infrastructure ramps up is stream processing, says Silicon Angle’s Maria Deutscher. Start-ups like Striim and Alooma have worked out processes in which analytics can be applied to data in transit, which allows users to respond to opportunities faster and even respond to developments as they are happening in order to affect the outcome. Striim, for example, says it can isolate various metrics within the data stream and check them against data that has already been ingested into analytics to produce new insights, all without disrupting the data flow. As more and more organizations start to rely on real-time analytics to gain a competitive edge, stream processing will likely become a key component of Big Data infrastructure.
Of course, another way to manage Big Data infrastructure is to simply let someone else worry about it. Numerous service models are starting to crop up that will allow organizations to book resources by the hour or by the job in order to produce the required results without the upfront cost and complexity. Alcatel-Lucent (ALE) recently introduced the Network on Demand service that combines its existing Intelligent Fabric, Unified Access and Network Analytics offerings to support IoT, Big Data and other emerging functions on an operational basis. The platform taps various ALE partners for pieces like LAN and Wi-Fi support and is delivered through resellers and other channel partners using automated cloud management toolkits.
While the temptation to ramp up Big Data and IoT infrastructure immediately is strong, lest some bratty start-up with a mobile app comes along and disrupts your entire industry, a bit of caution is in order as well. Too much infrastructure too soon will lead to overly complex environments that require a great deal of training to oversee, which inevitably leads to confusion, disillusionment and failure.
Data can only be of use if it is managed properly, and even though the full power of Big Data will only arrive with large volume management, it’s probably best to work out the kinks with small loads before pushing the technology to the extreme.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.