As the East Coast of the U.S. rotated toward the sun this morning, the scene resembled something that might have been part of Ursula K. Le Guin's planet 'Winter,' in her defining novel, 'The Left Hand of Darkness.' Everywhere you looked, there was snow and ice, but on first look, little else. Washington was buried in about three feet of snow. Everything was closed, including the U.S. government.
Leaving aside for now the question of whether having the government shut down is good or bad, the real key was that everything had stopped. As services were slowly restored, companies were finding that they had to deal with more than just cold offices. In many cases, data centers that had been without power for three or four days had exhausted their emergency fuel supplies. Other data centers didn't have such supplies and had stopped working even earlier.
Unfortunately, a weather event such as the Blizzard of 2010 makes it impossible for an IT staff to keep things running. If they're prepared, they will have had a chance to shut everything down in an organized fashion. If not, they'll find a data center where they don't really know what the state of the infrastructure might be. Perhaps the servers were triggered to shut down gracefully, and perhaps they weren't.
According to Kevin Burke, Critical Systems Manager for American Power Conversion, which is part of Schneider Electric, the worst thing that can happen to a data center is for power to go away in a hurry. 'The second worst thing is to have it shut down, and then come back right away,' he said.
Burke said that if the data center staff thinks there's a good chance their data center will have an outage, it's better to shut it down in an organized fashion so that everything is in a known state, and the infrastructure is preserved. He noted that some companies try to avoid this by having enough fuel on site to last for several days, or scheduling deliveries ahead of time, although in situations such as the one in Washington, it doesn't help if the roads are too bad for fuel deliveries.
Burke noted that it's equally important that the data center be brought back online in a manner that's as organized as the shut down, if not more so. He told me that you can't just turn everything on and expect it to work. When you start to turn on the transformers that feed the power distribution units, they exhibit a phenomenon known as 'inrush,' in which the device requires six to 10 times as much power to start as it does to run. If everything is turned on at once, this would create a demand for power beyond the capability of the UPSs to deliver, causing them to shift into bypass mode.
Instead, Burke said that you start by turning all of the power infrastructure on gradually, letting a few transformers and PDUs come online at a time. Only after the power infrastructure is online do you start your IT equipment, starting with the infrastructure, then other items in the order required by the software and applications that are running.
Burke said it may take a series of manual steps to get everything online without tripping your breakers upstream. He also noted that you won't be operating in a fully protected condition until the batteries recharge in the UPS. He said a recharge usually takes 10 times as long as the device's run time.
Once everything is running, it's important that you test your emergency equipment right away to make sure it's still in good condition. You'll need to see if your generator needs service, for example, and you'll need to get more fuel. He also pointed out that you will need to test your air conditioning chillers before you bring up your data center.
There are steps you can take to minimize the problem, such as being aware of potential problems and maintaining a line of communications. This means keeping the Weather Channel tuned in, and making sure you have an emergency phone with a hard-wired POTS line (VoIP doesn't work when the power is out). Burke also added that new high-efficiency infrastructure will help prolong run time because it uses less energy. Finally, he said, exercise your generator under load regularly (something I didn't do) so that you will know you can rely on it.
For those of us in the DC area, it's time to start bringing up our data centers. We hope that it will be more organized than the shut down.