Amazon.com, one of the most popular e-commerce sites in the world, experienced a widespread outage that lasted a few hours Tuesday. While not taken offline, blank or partial pages greeted many customers for more than three hours, according to CNET News, rendering it impossible for them to browse or shop. With annual revenues of nearly $27 billion, simple arithmetic will show that each minute of downtime stacks up to $51,400 per minute in potential losses.
Beyond calling it a "technical difficulty" that affected the U.S. version of the site, the company was surprisingly tight-lipped about exactly what happened. Of course, IT folks care not so much about the delays to shopping, but that the same company runs Amazon Web Services, which pioneered the concept of cloud computing as an ultra-reliable and infinitely scalable platform.
Does This Mean Cloud Computing Can't Be Reliable?
So do we reconsider our plans to use cloud computing as part of our IT infrastructure? I feel that there is no need to panic and change your SMB's posture toward cloud computing at this point. Unlike the "accidental power problem" that took down another well-known company for more than a day, Amazon's outage is comparatively limited in scope. In addition, the downtime seems limited to the image-processing aspect of the site and affected only the U.S. version.
Ultimately, Amazon has achieved a stellar uptime record over more than a decade in operation. Indeed, this week's outage represents only the fourth major outage in its 15-year history. A quick rundown: There was a 30-minute outage in 1999, an hour in 2006, and a 90-minute outage in 2008.
That shows that extremely high availability can be achieved. In fact, Amazon has not just "talked the talk" on reliability, but has a decade-and-a-half track record to show for it.
Focus on Business Continuity Planning
This incident does underscore, however, the continued relevance of business continuity planning. Regardless of the redundancy and care put into achieving uptime on primary systems, the fact is that disasters do occur. When they do, it is imperative that businesses of all sizes have a plan to deal with whatever contingencies arise.
For example, what happens when your retail outlet's Internet connectivity is disrupted, or if the point-of-sale (POS) terminal breaks down? For the former, what are the critical business processes that require access to the Internet and the manual procedures to substitute in the interim?
In the latter example, do the staffers have a number to call for immediate assistance or do they have to struggle through contacting the receptionist, supervisor, general manager and IT manager? For that matter, are contracts with vendors (or spare units) in place with a defined amount of time for the problem unit to be replaced?
Remember, no hardware or service is totally failsafe.
I talked more about the importance of business continuity here, and also gave some practical tips to ensure business continuity from a technical reliability perspective here. In the meantime, I would love to hear of any business continuity tips that you might have for SMBs.