SHARE

Hardware Failure: Time to Lighten Up a Little?

Network Visibility Can Help Avoid the IT Blame Game Probably the single most hated word in the IT lexicon is “failure.” Hardware failure, application failure or (shudder) data center failure are enough to strike fear in even the most hardened enterprise tech. But like measles and Scarlet Fever, what once seemed terrifying tends to lose […]

Written By

Arthur Cole

Aug 15, 2014

Network Visibility Can Help Avoid the IT Blame Game

Probably the single most hated word in the IT lexicon is “failure.” Hardware failure, application failure or (shudder) data center failure are enough to strike fear in even the most hardened enterprise tech.

But like measles and Scarlet Fever, what once seemed terrifying tends to lose its capacity to frighten when new technologies are brought to bear. And as the age of virtual and software-defined architectures unfolds, it could very well turn out that what was once fatal will soon be, well, if not cured, then at least manageable.

Vantage Data Centers SVP of Operations Chris Yetman, for one, is calling for an end to the zero tolerance for failure that grips most IT shops. As he explained it to IT Trends & Analysis recently, focusing on improved recovery and failover will do more to help the bottom line than a zero failure policy ever will. Not only can you push the utilization rate higher, lowering both capital and operational costs, but the extent and duration of failure will be lessened. Failure is inevitable, so why not focus your energies on where they will do the most good: getting back on your feet again.

Part of this process will be to redefine your failure domains to reflect the changing nature of data architectures, says Plexxi’s Mike Bushong. For instance, SDN and bare metal switching offer radically different controller architectures, with SDN placing much greater responsibility for network functionality on a single controller. A proper failure domain, then, should cover issues like whether the control is or is not an active part of the data path and whether you prefer a single domain or several smaller ones to enhance management distribution. And for those running bare metal architectures (or both, as is likely for the time being), domains should properly reflect the convergence and resource pooling that is likely to take place as the enterprise consolidates its infrastructure.

All of this is the difference between simple backup and recovery and full business continuity, says Paul Cash, of UK consulting firm Fruition Partners. With continuity, the focus is on getting service back to normal, which calls for an integrated approach to B&R, systems failover, IT service management and a host of other functions. And the biggest impediment to effective continuity is bad planning, which in itself is usually caused by the set-it-and-forget-it mentality. Enterprise architectures and processes are changing at a rapid pace, so the worst thing for continuity is a plan based on system configurations that are one, five or even 10 years out of date.

Of course, another problem is the continued reliance on popular, but nonetheless complex and inefficient, architectures that make it difficult to swap out and reprovision failed resources. A case in point is the storage area network (SAN), says SIOS Technology’s Jerry Melnick. New SAN-less clustering approaches built on the virtual layer offer replication and failover across multiple hosts with little or no service interruption. The latest SAN-less solutions even offer this functionality across wide geographic areas, offering protection in the event of widespread disasters. And with local solid state storage solutions in the mix, enterprises also gain the benefit of improved application performance and dramatically lower storage costs.

New data paradigms are about more than just advancing technologies. They force changes on the way we build, manage and interact with the data ecosystem. Hardware failure in particular used to be the Code Red of the IT shop, but as functionality moves into the virtual and application layers, the health of a single piece of hardware, or even a collection of pieces, becomes less important.

Failure is still an issue to be dealt with, but if properly planned for, it no longer has to be a crisis.

Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata, Carpathia and NetMagic.

Arthur Cole

With more than 20 years of experience in technology journalism, Arthur has written on the rise of everything from the first digital video editing platforms to virtualization, advanced cloud architectures and the Internet of Things. He is a regular contributor to IT Business Edge and Enterprise Networking Planet and provides blog posts and other web content to numerous company web sites in the high-tech and data communications industries.

Hardware Failure: Time to Lighten Up a Little?

Arthur Cole

Recommended for you...

Company

Categories