The Science and Art of Writing an IT/DR Recovery Plan

There are two main aspects to writing a disaster recovery plan: The science of knowing what to include and the art of writing the content. Today’s blog takes a look at both elements.

 

 

The two main tasks in writing an IT/disaster recovery plan are determining what the plan should cover and actually writing the language to fill out the different sections.

Most people focus on the first part and neglect the second part. This leads to a situation where most plans cover the correct topics, but the coverage is so vague and wordy as to make the plans unusable.

One of the most popular online searches of business continuity–related topics is for “recovery plan template.” This is most likely because of a common belief that having the right recovery plan template amounts to a magic bullet. If you have an adequate template, the thinking goes, all you have to do is pad it out with some verbiage and you are all set.

In fact, if you take an excellent recovery plan template and fill it with garbage, your recovery plan will be garbage. The outline of the plan is important, but how you fill in the outline is critical if you want to make the plan clear, concise, pertinent, and actionable.

There are a lot of good templates out there, but a template is not a plan. The most important thing in a plan is having the right level and type of detail. In the end, the most important thing is not the format; it is the content.

 

Two Preliminary Points

Before we turn to the topics of what to include in a recovery plan and how to write the sections, let’s get two preliminary matters out of the way:

First, this article is not intended for people who simply want to be able to check off a box saying that, yes, they have an IT/DR recovery plan. It’s for people who want their recovery plans to actually be of use to the organization in the event of an outage.

Second, in talking about recovery plans it is common to speak as though an organization only needs one plan or document; however, most organizations actually need several. Determining what documents are needed is part of the art of writing recovery plans, a topic we’ll get to in a moment.

 

Sample Recovery Plan Table of Contents

As mentioned above, there are many good recovery plan templates out there. Below is a sample recovery plan table of contents that conveys the same basic information as a template: 

1. PLAN OVERVIEW 

a. System / Application Description 

b. Recovery Strategy 

c. Recovery Team Member Roles and Skill List 

2. ENVIRONMENTAL AND INFRASTRUCTURE RESTORATION   

a. Environmental and Infrastructure Recovery Action Tasks

b. Environmental and Infrastructure Restore/Recovery Procedures or Scripts

3. FUNCTIONAL/APPLICATION RESTORATION 

a. Functional/Application/Database Recovery Action Tasks and Dependencies  

b. Data Base Restore/Recovery Procedures or Scripts  

c. Application Restore/Recovery Procedures or Scripts 

4. DATA RECOVERY/SYNCHRONIZATION AND BUSINESS VALIDATION  

a. Data Recovery/Synchronization Action Tasks 

b. Data Recovery/Synchronization Procedures or Scripts 

c. Business Validation Tasks 

d. Business Validation Procedures and Scripts

5. RESUME PROCESSING 

a. Action Tasks for automated or batch processing 

b. Post Turnover Tasks 

c. Deviations to Normal Operating Procedures 

6. APPENDICES  

a. Plan Information 

b. Offsite Storage Information 

c. Backup and/or Replicated Information 

d. Hardware Mapping Between Primary and Failover Site 

e. Hardware/Software Primary Site Configurations and Requirements 

f. Hardware/Software Failover Site Configurations & Requirements 

g. Department Work Requirements (Forms/Documentation/Workstations) 

h. Critical Functions and Inputs/Outputs/Dependencies  

i. Recovery Timeline and Plan Assumptions 

 

This is a high-level description of the contents of an IT/DR recovery plan as MHA Consulting prepares them. We might describe this as the science of what goes into a plan. Let’s move on to the art of writing such a plan, which as previously explained is both critical and frequently overlooked.

 

The Art of Writing a Recovery Plan

The art of writing disaster recovery plans consists in working out what plans an organization needs and in writing the individual sections so that they contain pertinent, concise, actionable information. That information needs to be at the correct level of detail for the likely person performing the actions.

Most recovery plans contain too much explanatory, policy-type information. This might suffice to pass an audit, but it is usually of limited use in helping someone recover an interrupted process.

Explanations of strategy, policy, and purpose belong in the appendices.

If your recovery plan is full of paragraphs of narrative writing, something is wrong.

A recovery plan should mostly be in the form of checklists. It should state the things that need to be done to effect the recovery. Think pilot’s pre-flight checklist or surgeon’s pre-operation checklist. A recovery plan will also include the procedures to support the checklist. Time is precious during a recovery and information should be limited to what is necessary.

 

For tips on writing checklists, see MHA Consulting CEO Michael Herrera’s article: The 4-3-3 Rule for Writing Business Recovery Checklists

 

Another important aspect of the art of writing recovery plans is including the correct level of detail. Getting this right is a matter of thinking about who is likely to use the plan and orienting your information to a person with that level of knowledge.

Plan writers commonly make one of two mistakes. Often, they assume that the person who will be effecting the recovery is themselves or another primary expert of the environment. Since they already know all about the process, they don’t see the need to spend time documenting the details.

At other times, plan writers think they must create plans so detailed they could be performed by any random person coming in off the street.

These two errors result in plans that are cumbersome at best and useless at worst.

Plans should be written at the level of a person who is a competent professional in the field but an outsider to the organization or department. There is no need to detail procedures that everyone in the field knows how to perform. But institutional information such as non-standard technical configurations or naming conventions should be included.

 

Combining Science and Art

Writing good IT/DR recovery plans requires a combination of science and art. The writer needs to know what topics to cover, which can be learned by consulting publicly available recovery plan templates or the sample table of contents included above.

The writer also has to have a good grasp of the art of writing recovery plans. The means understanding what plans the organization needs, being able to distill the information for each plan into a concise checklist, and being able to intuit and include the correct level of detail, namely enough to permit an expert from outside the department to perform the recovery.

By combining science and art in this way, a writer of recovery plans can produce plans that will truly be of value when and if critical processes at the organization go down and need to be recovered under emergency conditions.

 

Further Reading

For more information on writing IT/DR recovery plans and other hot topics in BC and IT/disaster recovery, check out these recent posts from MHA Consulting and BCMMETRICS:

About
Richard Long
Richard Long is one of MHA’s practice team leaders for Technology and Disaster Recovery related engagements. He has been responsible for the successful execution of MHA business continuity and disaster recovery engagements in industries such as Energy & Utilities, Government Services, Healthcare, Insurance, Risk Management, Travel & Entertainment, Consumer Products, and Education. Prior to joining MHA, Richard held Senior IT Director positions at PetSmart (NASDAQ: PETM) and Avnet, Inc. (NYSE: AVT) and has been a senior leader across all disciplines of IT. He has successfully led international and domestic disaster recovery, technology assessment, crisis management and risk mitigation engagements.
it/disaster recovery