The Step-by-Step Guide to Disaster Recovery

Save to My DOJO

The Step-by-Step Guide to Disaster Recovery

We’ve talked about cataloging personnel and items, configuring systems to protect against data loss, and setting up sites to accommodate failed over data and dislocated employees.
Now, we need to establish the processes that people will follow during and after a disaster. We covered the topic of downtime procedures earlier. We mainly intend those for times when the system is offline but recoverable.
Your disaster recovery business process must accommodate failure of greater magnitude. You can use the downtime procedures that you developed as a starting point and as an idea generator, though.

Incident Response

Businesses encounter challenges every day. Executives and staff quickly learn how to prioritize and handle the problems that they face. On a typical day, their difficulties fall in line with normal expectations. Events outside the norm take extra time to understand and adjust to.
The time needed scales with the degree that an occurrence skews from normal and familiar. If staff don’t know who to contact, that compounds the problem.
To smooth handling of emergencies, organizations need to build an incident response process. Larger organizations often have designated incident response teams. Whether assigned to an individual, a team, or collectively to everyone, incident response begins with triage.
Members of an incident response team might not know what to do, but they must know who to involve. Relaying information to the incident response team usually happens automatically as employees pass news up their reporting chain. Eventually, it reaches someone that knows how to activate the response process.
An incident response team should include at least one, preferably two, members from every department. As organizations subdivide, the response team grows. When activated, the team should collaborate as quickly as possible. 

They need to decide on questions such as the following:

  • Can a single department or subgroup handle the incident? 
  • Will this event impact other departments or subgroups? 
  • Has the problem caused downtime? 
  • Will downtime continue? 
  • Does the team need to send broad notifications to employees? 
  • Should staff reach out to customers? 
  • Who will address the problem? 
  • How will staff involved directly in a solution send updates to the response team? 
  • How will the response team update employees or customers? 

A problem that necessitates involvement from an incident response team often works much like a planned project. If you have experienced project managers, appoint one or more to serve on the team.
Effective incident response requires participation. Establish clear procedures for designating alternates. A vacation or illness should not prevent a rapid solution to an unexpected event. 

Executive Declaration 

Enacting downtime and disaster recovery processes has associated costs. Personnel cease carrying out their normal functions and shift into their alternative emergency roles. Switching from a primary data system to a replica has time and risk implications mentioned in the relevant section.
Equipment and inventory inspection and recovery efforts will accrue liabilities and debt, as will calling in contractors for any tasks. To keep things in order, categorize three levels of event response:

  • Define activities that occur during and after a crisis. These should expect minimal or no supervisory guidance. This level includes items such as moving everyone to safety, notifying authorities, and beginning low-impact downtime procedures. 
  • Create a “downtime” operational level. Because switching to and from downtime operations incurs time and risk, clarify that it can only happen when indicated by staff with a particular level of authority. This would not include any low-impact activities that you included in the first level. 
  • Specify a “response and recovery” operational level. This involves accounting for all personnel, relocating and failing over to alternative sites, and implementing equipment and data recovery processes

The indicated names are arbitrary; use anything that makes sense. The important part is defining responses in advance during a calm so that staff have fewer problems to solve during an emergency. Having predefined levels also helps to reduce improper reactions, such as bringing a replica online while the primary site still functions. 

Preparing and Planning for Impacted Personnel

First, your business continuity plan must cover the human aspect. It needs to provide actions and guidance, both for the people that enact the plan and for the people impacted by whatever condition caused the plan to go into effect.

Predicting user impact 

Disaster recovery plans tend to have a high degree of sterilization and focus on the business, assets, and data. While all of that probably requires the most quantity, none of it is as important as the people. Employee safety needs to top all priority lists. 

The plan will include a great deal of content on what processes to follow. That will help to keep staff focused, but at all phases, everyone involved in planning needs to remember that crisis conditions look nothing like a typical day at work.
 You can make some predictions on the sorts of disasters that our business would be most likely to face, but that has limited value. Most people do not know how they’ll react to a catastrophe until they face one. There is no such thing as a “normal” response. Some will focus and work well under pressure; others will not. 

People will be scared, in shock, injured, or have any of a number of other adverse responses. Afterward, the effects can linger. The death or serious injury of a coworker can traumatize others. 

While you have no way to know exactly what will happen, you can plan with the expectation that anyone who needs to put it into action will have a disadvantage.

Tips for effective response plans:  

  • Keep all instructions short and clear; 
  • Do not assume that anyone enacting your plan understands corporate or departmental jargon and colloquialisms; 
  • Use acronyms and mnemonics for disaster response training. In documentation to follow during a response, clearly spell out any acronyms or symbols; 
  • Employ iconography. For instance, if process B depends upon the completion status of process A, use a large icon of a stop sign or similar callout at the end of process A; 
  • Where iconography does not suitably attract attention, use textual clues. For instance, large “Warning” boxes in a bold color and using large fonts.

Research or brainstorm acronyms for problems that are likely to occur and that require uncommon activities. For instance, most people have never used a fire extinguisher. You might create literature on using fire extinguishers that includes the common “PASS” acronym.

Then have pictures, or better yet, a video that matches “P” to pulling the extinguisher’s pin, “A” to aiming at the base of the fire, the first “S” to squeezing the trigger, and the final “S” to sweeping the nozzle back and forth.

If you include directions with extinguishers (highly recommended), you can have a short tag with these items spelled out. Do not assume that anyone remembers (or even attended) the training.

You can create your own acronyms. As an example, you could create a fire protocol and call it “The three Es (EEE): Extinguish, Evacuate, Escape”. Your training would expand these to “extinguish the fire if possible”, “evacuate others”, and “escape yourself”. If drilled, people have a better chance of remembering what to do when they have some simple mnemonics to work with. 

Do not overuse these memory tools. For instance, if you search the Internet for “emergency response acronyms”, you will find lists that contain government agencies, response programs, and common phrases used when response personnel communicate with others. 

People who work in disaster response full time might remember these, but no one else will. Have only a few and try to have them on printed literature near any equipment that relates to the situation that they address. 

Above all, remember that some catastrophes affect more than just your business. Some of your staff may have had their lives upended. Many will have things in their own lives to recover from. Business continuity planning must include flexibility for employees.

Working with displaced employees

Catastrophes can render a site unusable for a significant period of time. Plan in advance what the employees will do.

If you redirect staff to an alternative site, ensure that everyone knows the location. Include a reminder in the notification system. Importantly, have someone verify the viability of the site before sending everyone there. You can use an initial message that informs every one of the situations and instructs them to wait for further notifications. Once someone deems the secondary site usable, send a follow-up notification.  

Remember that, just like surviving a disaster, an interrupted work routine causes distress. People will arrive late, get lost, and need to leave at atypical times of the day to reach appointments that normally needed only a few minutes of travel time. Plans should expect erratic attendance patterns while employees adjust.

Working with offsite employees

Many positions began transitioning to remote work years ago. The need for isolation brought on by the COVID-19 crisis dramatically accelerated that transition. 

As long as remote employees still have some system to connect to and were otherwise not impacted by the event, little changes for them. Include them in communications about the situation and remember that the conditions will have some effect on them. 

You may choose to have some employees begin working from home that would normally commute to a physical location. An effective transition from on-premises to at-home work requires a substantial amount of advance planning, especially if you do not currently have a formal remote work policy.  

Your organization will need to answer many questions:

  • Do employees use their own hardware? 
  • Does the company provide equipment? 
  • Does the company reimburse? 
  • How will remote employees maintain communication? 
  • Will you pay for a premium collaboration service, such as Zoom? 
  • Will you enforce a requirement of a particular service? 
  • Will work hours change? Flex? 
  • Will users connect via a VPN? VDI? Microsoft Remote Desktop sessions? A Citrix solution? Something else? 
  • Do your systems have sufficient capacity to support the potential number of remote workers? 

Some employers worry that productivity will drop from remote workers. Studies (Are Remote Workers More Productive Than In-Office Workers? & Is Working Remotely Effective? Gallup Research Says Yes) have shown that this concern has no basis. However, if the source event was a major disaster, the psychological effects and any damage to employees’ property will impact their work. 

Even without that, transitioning from the office to the home does take time. Balloon some adjustment flexibility into your plan. 

Notifying and accounting for employees

Your plan should already include notification trees and contact methods. Response documentation must include an accounting system. Small businesses can do this informally. 

Medium-sized businesses can require employees to check in with their supervisors who in turn report to a central command structure. Large businesses can do the same in a tree structure or make use of telephone numbers. 

Define processes for unreachable employees in the context of a widespread disaster. You can use things like “unknown”, but that cannot be a final disposition after attempting a single phone call. Establish a schedule for retries. When multiple attempts to locate an employee fail and you can no longer devote resources to them, report them as missing to the authorities.

To reiterate, do that only when you have reason to believe that the person might be in danger. Do not call the police if a systems administrator doesn’t answer a text message about a server crash. While that might seem obvious, make the conditions very clear in your documentation. 

Design Guidelines for Business Continuity Processes 

The overall goal of this article is to cover the role and importance of people in a disaster recovery plan. Use this information as a starting point and guidance system for building your own documentation. The actual processes to include must come entirely from your business experts. Start with these high-level points: 

  • Guidelines for managers and executives to decide between a short interruption that warrants no major response, an event that justifies switching to full downtime procedures, and a genuine disaster that requires an orchestrated response 
  • Minimal and full downtime procedures 
  • Employee immediate response, notification, and accounting procedures 
  • Relocation activities 
  • Remote work policies and practices 
  • Recovery processes

Design-Guidelines-for-Business-Continuity-Processes.

 

Conclusion

The recovery process portions will need a lot of space. They should only start after the immediate problems have passed. Recovery will include installing replacement systems, restoring data, ordering equipment, organizing contractors, filing insurance claims, notifying customers, and any other activities that staff indicate.  

Altaro Backup Solutions
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment

Your email address will not be published. Required fields are marked *