The accompanying 10 slides list 10 common mistakes that disaster recovery teams tend to make. Contributed by Marathon Technologies Corp., they are based on the company’s significant experience in this field. While the observations are pertinent for both enterprises and small and mid-sized corporations alike, SMBs may also want to read more on how SMBs Should Approach Disaster Recovery Differently and the Keys to a Successful SMB Disaster Recovery Implementation.
Founded in 1993, Marathon Technologies Corporation is the leading provider of automated, fault-tolerant, high availability solutions for virtual and physical environments.
Click through for more information about key mistakes disaster recovery teams should learn to avoid.
A lot of companies confuse high availability (HA) and disaster recovery (DR), or implement a DR solution when they really need HA. Put simply, HA is about preventing the everyday failures that cause downtime (network card failure, storage corruption), while DR solutions are designed to help you recover from true disasters (floods, hurricanes), not minor problems.
Implementing disaster recovery software or speaking broadly about ‘what-ifs’ is not enough. The IT team must be well versed in a set plan that has been tested and proven effective. IT staff, as well as upper-level management, should be trained in the DR protocols in the case of any business disruption. In the event of a disaster, team members should already be familiar with the plan and not rely on in-the-moment decision making.
While testing the plan may not mean that it will go off without a hitch, it is an important step in preparing the company for a disaster. After testing, improvements should be made and the plan should be scrutinized for any possible holes.
Disasters affect the entire business, not just your IT infrastructure. Representatives from all company departments should be involved in the planning process and should know their role in the event of a disaster. In addition, it is imperative to train company executives and decision makers in how to carry out the plan. They should be aware of all protocols, and be involved in testing exercises.
Many technologies actually introduce complexity into the IT environment. For example, clustering technologies may require administrators to painstakingly maintain each server in the cluster to support successful failover. IT organizations instead should find and embrace those technologies that reduce complexity for operational staff – thereby eliminating potential sources of human error.
While it is tough to justify shelling out the extra dough for a top-of-the-line server, it is well worth it on the day that your processor fails. Many IT staffs are working with constrained budgets and therefore have to buy lower-priced equipment. This equipment is more likely to see failures, increasing the likeliness of future problems.
For example, dual-ported network cards share common hardware logic, and a single card failure can disable both ports. For full redundancy, you need either two separate adapters or a built-in network port combined with a separate network adapter.
Many factors can cause site-wide failures, including an air conditioning failure or leaking roof, a power failure, or a major hurricane. Site disruptions can last anywhere from a few hours to days or even weeks. There are two methods for replicating data across sites. One method is to tightly couple redundant servers across high speed/low latency links, to provide zero data-loss and zero downtime. The other method is to loosely couple redundant servers over medium speed/higher latency/greater distance lines. This provides a disaster recovery capability where a remote server can be restarted with a copy of the application database missing only the last few updates. In the latter case, asynchronous data replication maintains a backup copy of the database.
DR/HA is not one-size-fits-all. Every business has different objectives for different applications. It’s ok to look to others for guidance, but stay focused on your specific goals.
What exactly is it that you need to accomplish?Implementing wrong or incomplete solutions can waste time and money. Know what clients and users need and adjust the DR plan based on the service levels that need to be met.