When the Cloud Goes Down

    The enterprise has come to rely on the cloud for an increasing share of its mission-critical workloads. This means cloud infrastructure needs to be available and reliable as never before.

    But there is a disconnect of sorts as to who is responsible for what when it comes to uptime, or even what to do when infrastructure fails and business activity comes to a halt.

    A recent survey by Vanson Bourne commissioned by cloud management firm Veritas found that nearly 70 percent of enterprises believe that functions like data protection, data privacy and compliance are the responsibility of the cloud provider. This may come as a surprise to providers, many of whom state explicitly in their service agreements that responsibility for data management falls on their customers’ shoulders. At best, downtime is compensated in the form of reduced rates or free service periods in ensuing weeks or months, but making up for lost data or lost sales is unheard of.

    But while the cloud is viewed as a fairly safe alternative to traditional infrastructure, mainly through its ability to shift workloads and mirror data to alternate sites with relative ease, failures do occur. And as the Uptime Institute’s Matt Stansberry and Lee Kirby note, this is not always due to unforeseen circumstances or unlucky, random occurrences. For one thing, the vast majority of outages are due to highly predictable events like hardware failure and human error – things that the IT industry should have solved years ago. Whether it is a blown circuit or a mistyped keystroke, the fact that these minor issues are allowed to cascade into significant, durable outages is a failure of management, not technology nor personnel.

    Sometimes, the very systems that are intended to safeguard critical infrastructure cause outages simply by working exactly as they are designed to. A case in point is the recent shutdown of Azure infrastructure in Europe, which was initially triggered by a release of flame retardant during a routine test. This triggered an automatic shutdown of air-handling units that in turn caused data equipment to power down due to rising temperatures. Service was down for about seven hours, although customers who signed up for Microsoft’s Availability Sets feature retained access to isolated hardware clusters that were not affected.

    This is evidence that the cloud’s ability to improve disaster recovery can benefit both on-premises and cloud-based infrastructure, but only if the enterprise manages it in a continuous, proactive manner. ISG consultant Dr. Cindy LaChapelle notes that organizations with a “set it and forget it” mentality when it comes to DR often place themselves in an even worse position than those with no DR program at all: a false sense of security that their outdated system offers adequate protection. To get it right, organizations need to view DR as an operational mandate, with constant testing and adjustments to ensure both local and third-party infrastructure can provide failover with minimal downtime.

    Of course, cloud providers are responsible for their infrastructure and ensuring that resources are there when you need them. But when the inevitable outage occurs – and surveys indicate this should happen less frequently in the cloud than on traditional infrastructure – the enterprise is still responsible for maintaining IT services to its users.

    When mission-critical data is on the line, it is the enterprise mission that is most vulnerable, not the cloud provider’s.

    Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.

    Arthur Cole
    Arthur Cole
    With more than 20 years of experience in technology journalism, Arthur has written on the rise of everything from the first digital video editing platforms to virtualization, advanced cloud architectures and the Internet of Things. He is a regular contributor to IT Business Edge and Enterprise Networking Planet and provides blog posts and other web content to numerous company web sites in the high-tech and data communications industries.

    Latest Articles