Disaster Recovery 101: Don’t Make This Common Mistake

Richard Long

Responding to a disaster is a three-phase process, but in implementing this process business continuity professionals commonly make a serious mistake. In today’s post, we’ll tell you that common disaster recovery mistake is and give you a disaster recovery 101 review to show you how to avoid it.

The three phases of responding to a disaster are:

  1. Assessment
  2. Restoration
  3. Application Validation/Data Updates

Let’s look at each in detail.

PHASE 1: ASSESSMENT

The Assessment phase is all about taking stock. There’s been a disruption. There are impacts to your organization, some of which you know, some you might not know. There might be continuing threats to your people, facilities, IT, and business processes. The situation is probably murky and confusing.

In this first phase of responding to an emergency, you focus on determining exactly what is going on.

The following are some of the questions you should ask during the Assessment phase:

  • What is the current state of business functions?
  • What is the impact of the event on the organization’s IT services?
  • What are the priority levels of the impacted services (so you know which to restore first)?
  • For services that have not yet been impacted, is there any risk they could be impacted as the event unfolds?
  • For downed services, how long are the outages expected to last?
  • What steps are required to restore services at the primary data center?
  • For affected business processes, are there any workarounds that can be used to restore functionality?
  • Is a restoration of IT functioning at an alternate site necessary? (The answer might depend on when services can be restored at the primary location.)

PHASE 2: RESTORATION

In the Restoration phase, you are taking action. You are vigorously following the steps in your recovery plan, doing what’s needed to restore your business processes and IT systems.

Here are some of the key steps you should be taking in the Restoration phase:

  • Ensure there is an overall restoration coordinator.
  • Review and orient the restoration teams on logistics and expectations regarding following your recovery plans and reporting problems. (Without such an orientation, chaos can break out.)
  • Ensure the organized and disciplined execution of restoration tasks.
  • Remind the teams to perform tasks based on the plan and not their memory.
  • Ask for updates; follow up with team leads or managers to ensure tasks are on track.
  • Promote active time management. Track how much time it takes to troubleshoot and resolve each issue. Otherwise, issue identification or resolution can take up time out of proportion to their importance, delaying the tackling of other problems. This can be to the detriment of effective and efficient recovery.
  • Define and perform regular restoration updates, including issue status of both IT and non-IT departments.

As you’ll see below, the Restoration phase seems to be everyone’s favorite phase.

PHASE 3: APPLICATION VALIDATION/DATA UPDATES

The Recovery phase is about getting your processes back onto a normal footing and validating that everything is working as it should. The following are some things you should consider as part of this phase:

  • Is there anything that needs to be addressed in terms of application or process-specific changes? Think specifically about interfaces to third parties.
  • Has data been lost? What data needs to be recreated or integrated? Have all the integrations and interfacing self-healed or is there a need for manual intervention?
  • Do you have to make any adjustments to processing, as a result of known performance or capacity issues?
  • Are changes needed due to dependency issues?
  • Are workarounds needed to deal with situations where different systems within a larger, integrated system have come back online at different times? Sometimes critical systems might be restored while less important systems upstream or downstream might remain unavailable.
  • Perform functional validation at the IT and business levels prior to turn-over. It is much easier to identify and correct issues prior to production turn-over than several hours into data and transaction flow.
  • Ensure backups at the alternate location are running and functional. You do not want to have an issue and lose all the work performed during recovery once the environments are in production again. If this happens, you’ll have to restore all over again.

THE COMMON DISASTER RECOVERY 101 MISTAKE

So what is the mistake that we see business continuity professionals make over and over again in following the above three-phase recovery process?

They reduce it to a one-phase recovery process. That is, the only phase planned and exercised is Phase 2, Restoration.

The Assessment is ad hoc and Application Validation is assumed to be unimportant or not needed.

This would be like if you went to the hospital and a doctor operated on you without pausing to find out what was wrong with you and discharged you as soon as the surgery is complete without making sure you were well.

The problem with this approach to DR is it jeopardizes your recovery. It increases the likelihood you will miss things, prioritize the wrong things, and perform tasks that aren’t necessary.

It means your recovery probably won’t work as expected and increases the chances it will contain major gaps—gaps that might be difficult or impossible to fix during an event.

PERFORM ALL 3 PHASES

A common mistake BC professionals make is to skip the Assessment phase, blaze through their Restoration procedures, and then call it a day without bothering to perform the Recovery phase. Don’t make the same mistake.

Disaster recovery is a three-phase process, not a one-phase one.

By including all three phases as part of the planning for the DR process, you increase the chances that the recoveries you oversee will be sound, functional, and efficient – and not come back to embarrass you and hurt your organization.

FURTHER READING TO HELP YOU LEARN DISASTER RECOVERY

For more information on recovering from disasters, learning disaster recovery, and other hot topics in business continuity, check out the following recent posts from MHA Consulting and BCMMETRICS:

About
Richard Long is one of MHA’s practice team leaders for Technology and Disaster Recovery related engagements. He has been responsible for the successful execution of MHA business continuity and disaster recovery engagements in industries such as Energy & Utilities, Government Services, Healthcare, Insurance, Risk Management, Travel & Entertainment, Consumer Products, and Education. Prior to joining MHA, Richard held Senior IT Director positions at PetSmart (NASDAQ: PETM) and Avnet, Inc. (NYSE: AVT) and has been a senior leader across all disciplines of IT. He has successfully led international and domestic disaster recovery, technology assessment, crisis management and risk mitigation engagements.
dealing with disasters