The Four Phases of Disaster Recovery

The rise in the use of third-party computing services has given many companies a false sense of security regarding the recoverability of their IT systems. In today’s post we’ll look at why organizations still need to be adept at IT disaster recovery (IT/DR) and describe the four phases of restoring IT services after an outage. 

Related on MHA Consulting: Learning to Talk to Your IT/DR Colleagues

Knowing How to Recover Is Still Important 

With the migration toward cloud computing and Software as a Service (SaaS), many organizations have grown complacent about IT/DR. Their thinking is that the burden of recovering is now on their IT services vendors. However, most organizations retain some level of on-premises IT capability, and if this can’t be recovered, it might make the vendors’ efforts moot. For many if not most organizations, the need to be able to recover is as great as ever, while the layering of multiple environments has made the job more complicated. 

For these reasons, it’s important that IT departments (and business continuity professionals) make sure their organizations are capable of restoring their IT services after an outage. There are four main phases involved in doing this. Let’s look at them one by one. 

Phase 1: Preparation 

Technically, preparation is not a phase of disaster recovery since it happens before the outage. Practically, however, it might be the most important phase. If you haven’t prepared properly, recovering might be impossible. This is also the area where a lot of people fall short. The following are some things to keep in mind regarding the preparation phase. 

  • Prioritize your services and technologies so you know which to restore first. The usual way of doing this is by conducting a BIA
  • Identify which services and technologies your mission-critical services depend on; these will also need to be restored quickly. (Common examples include authentication, access, middleware, and network services.) 
  • A critical part of preparation is DR exercises and testing, to make sure people know what to do and that everything works. 
  • Testing needs to go beyond doing the same tests over and over. (This is another area where people often fall short.) Test across all the different technologies.  
  • Don’t assume your IT services partners have everything covered. Make sure they have all their steps in place and that your recovery is integrated with theirs. 

Phase 2: Assessment 

Now we come to the steps to take after the outage occurs. This is almost always a stressful and confusing period. It’s also the phase where the advice “Don’t just do something, stand there!” applies. That’s because the very first thing to do is figure out what happened and trace the contours of the impact. Here are the main things to consider in this phase: 

  • Before you carry out any restoration activities, conduct an assessment of the situation, risks, and impacts of the event. 
  • Identify the current state of your business functions. Find out which IT services and which service RTOs have been impacted. 
  • For unaffected services, investigate the risks they face. Is there a chance they might be impacted as the event develops? 
  • Estimate how long the outage will last.  
  • Identify the actions needed to restore services at the primary data center. 
  • Determine the functionality of any workarounds needed. 
  • Identify the potential processing impacts at the alternate site. Will there be performance impacts or capacity constraints? 
  • Determine whether a restoration at the alternate site is necessary. Depending on when services can be restored at the primary location, it might not make sense to perform a recovery at the alternate site.  
  • Company and IT leaders should make a formal decision either to wait, work on restoring the primary location, or relocate to the alternate site then communicate the next steps to the people involved in the recovery effort. 

Phase 3: Restoration 

If a decision is made to relocate to the alternate processing site, then your recovery effort enters the restoration phase. This is where the exercises you conducted in Phase 1 pay off. Consider the following as you embark on restoring your systems: 

  • Review the logistics and expectations of following your recovery plans with the restoration team. Explain the process for reporting issues. (Without a short orientation, chaos can occur.)  
  • Remind the teams to perform tasks based on the plan, not their memory. 
  • Have an overall coordinator for the restoration who actively asks for updates, verifying that tasks are on track.  
  • Track issues and the time spent troubleshooting. Without such tracking, issue resolution is likely to drag on, impeding efficient recovery. 
  • Define and perform regular restoration updates, including issuing status updates to all involved departments. 

Phase 4: Post-Restoration 

Many regard these next items as a subset of restoration, but these items differs sufficiently to merit treatment as a separate phase. Here are the main things to consider in this phase: 

  • Identify the application or process changes that need to be made (for example, think about interfaces with third-party vendors). 
  • Identify potential points of data loss. (What data will you need to recreate?) This is an area we often assume is “OK”; however, data protection synchronization is frequently found to be off, invalidating the assumption.  
  • Determine whether integrations and interfacing self-heal or there is a need for manual intervention. 
  • Be ready to adjust processing activities if necessitated by performance or capacity issues. 
  • Consider whether changes need to be made as a result of dependency issues. (Think about the integration of systems with different RTOs.) You might have critical systems up and running, but upstream or downstream environments might still be unavailable. 
  • Prior to turnover, perform functional validation both at the IT level and the business level. (It is much easier to identify and correct issues prior to production turnover rather than several hours into data and transaction flow.) 
  • Ensure that backups at the alternate location are running and functional. (You do not want to lose all the work performed during an event once the environments are productive again.) 
  • Start planning for the move back to the primary location. The transition will amount to another DR event, but this shift will be planned and controlled.  

Ensuring the Ability to Recover 

The increasing reliance of many companies on third-party computing services should not lead them to underestimate the importance of IT disaster recovery. The complexity of modern IT environments demands a thorough understanding of the four essential phases of IT/DR: Preparation, Assessment, Restoration, and Recovery.  

Proper preparation, including service prioritization and testing, lays the foundation for effective recovery efforts. The assessment phase is crucial for understanding the scope of the outage and making informed decisions. During restoration and recovery, clear communication, diligent tracking, and meticulous attention to details ensure a smoother transition back to normal operations. By mastering these phases, IT departments and BC professionals can ensure their systems remain recoverable even as systems grow more layered and the challenges of recovery more complex. 

Further Reading 

Richard Long is one of MHA’s practice team leaders for Technology and Disaster Recovery related engagements. He has been responsible for the successful execution of MHA business continuity and disaster recovery engagements in industries such as Energy & Utilities, Government Services, Healthcare, Insurance, Risk Management, Travel & Entertainment, Consumer Products, and Education. Prior to joining MHA, Richard held Senior IT Director positions at PetSmart (NASDAQ: PETM) and Avnet, Inc. (NYSE: AVT) and has been a senior leader across all disciplines of IT. He has successfully led international and domestic disaster recovery, technology assessment, crisis management and risk mitigation engagements.

Leave a Reply

Your email address will not be published. Required fields are marked *

Business continuity consulting for today’s leading companies.

Follow Us

© 2024 · MHA Consulting. All Rights Reserved.

Learn from the Best

Get insights from almost 30 years of BCM experience straight to your inbox.

We won’t spam or give your email away.

  • Who We Are
  • What We Do
  • Blog