Like the Borg, Big Data is hurtling toward the enterprise. You can't stop it, but at least you can prepare for its arrival. There is no shortage of advice as to how to do that when it comes to database applications and storage and processing upgrades, as any casual Internet search will show. What gets short shrift in the trade press, unfortunately, are Big Data's networking requirements. Neither the data nor the intelligence is of much value if they can't get to their proper destinations in a reasonable amount of time.
When it comes to Big Data, network infrastructure is in fact one of the top priorities among enterprise executives. At its heart, the Big Data challenge is a question of scale, and according to a recent survey conducted by QuinStreet, parent company of Enterprise Networking Planet, more than 40 percent of the 540 IT decision-makers polled say increasing network bandwidth is a top priority in preparing infrastructure for Big Data.
Networking outpolled expansion to server and storage infrastructure on the priority list and even beat out deployment of new cloud infrastructure as a means to add scale to internal infrastructure. In fact, networking was cited more often than the establishment of the analytics platforms themselves. It came in second only to the need for easy-to-use tools for the compiling and handling of Big Data loads, and even then, only by a fraction.
"There is tremendous focus on the compute and storage side, and for good reason," said Bithika Khargarian, corporate architect for vertical solutions at Extreme Networks. "But when you're talking about distributing your computational resources, you still need a robust network to connect them."
The need to distribute large data loads across multiple devices actually runs counter to what most enterprise infrastructure platforms are designed for these days. The last big trend, server and storage virtualization, was all about aggregating workloads onto a single piece of hardware. Distribution, of course, can't happen without networking, and the first question that comes up is, do we have enough bandwidth? With many organizations still relying on 1 GbE, even in core switching infrastructure, will the standard upgrade to 10 GbE intended to support virtual environments be enough, or should the enterprise plan to jump directly to 40 or even 100 GbE?
"Obviously, it will depend on the use case," said Brad Casemore, director of research for data center networks at IDC. "Many organizations are in the early stages of running their [1 GbE or 10 GbE] networks, and even if they know they will need a certain amount of bandwidth and certain latency requirements, they are not ready to expend a ton of money on them. But businesses that are serious about Big Data and are investing heavily in 10 GbE may want to bypass 40 and go directly to 100. It depends on whether you want to be in the vanguard."
To figure out where you fit on the adoption scale, it helps to make a clear-eyed assessment not only of your needs today, but in the future as well. A key way to do this, says Juniper's Calvin Chai, is to look at key metrics like access requirements, oversubscription ratios, buffers and latency.
"You're going to want 10 GbE, for sure," he said. "You'll want line rate switches so the backplane will be able to handle full duplex on the server-facing ports. When oversubscribing 10 GbE to the server and 40 G uplinks, you don't want to exceed a 3:1 ratio. And with buffering, it's good to go with 1 MB or higher."
Latency is more of a judgment call, he added. It is unlikely that organizations will have to build ultra-low, sub-microsecond capabilities into their infrastructure, but it does help to have consistency from port to port.
Still, bandwidth and throughput are not the only network aspects that must be addressed in order to accommodate Big Data. Most enterprises will have to deploy entirely new network architectures as well.
According to Gartner research director Andrew Lerner, a flattened fabric architecture provides not only improved connectivity and flexibility for dynamic virtual environments, but the rapid scalability needed to handle intermittent nature of Big Data loads.
"Big Data applications tend to be very bursty," he said. "That type of application runs best on a fabric architecture. Also, fabrics do not need extensive tiering because they are not running client-server traffic as much – there is more east/west, server-to-server traffic."
Fabrics scale more easily as well, which allows a closer match between resource consumption and data loads. When we're talking about data that can easily stretch into the terabytes, even a small gain in utilization rates can save a bundle in operating costs.
But before you think that Big Data will require a full rip-and-replace of legacy network infrastructure, note that most forward-leaning organizations are opting for greenfield deployments in support of Hadoop and other heavy workloads. At the same time, tools like MapReduce help to manage the parallel processing requirements of Big Data across multiple compute clusters, and by extension lighten the load on the networking side as well.
Ultimately, Big Data is more of an opportunity than a challenge. True, the extra load will require changes to data and network infrastructure, but the payback in business and operational intelligence, not to mention the operational efficiencies that can be transferred to standard application loads, are likely to remake the data environment as we know it.
It would appear, then, that all the tools needed to build Big Data infrastructure are already here. And as the QuinStreet survey indicated, enterprises across the board are squaring up to meet the challenge.
However, systems are resources that are only as good as the processes they support and the people who oversee them. In that vein, then, the real challenge is not capturing or analyzing Big Data, but figuring out what to do with the results. Is your network infrastructure ready to help deliver those results?
Photo courtesy of Shutterstock.