Hope is a great quality for a baseball team in a pennant race to have, but it is not very good as a business continuity strategy. This is why it is critical that organizations are conducting regular disaster recovery tests of their information technology systems, otherwise known as DR tests, or IT/DR tests.
These tests can inform you whether you are actually capable of recovering your systems in the event of a disruption and what issues are likely to come up while you are trying to do so. They reveal capability and identify gaps.
In some industries, IT/DR tests might not simply be a good idea; they might be required. This is often the case, for example, with companies that must satisfy FDA requirements or meet SOX reporting mandates.
For all of these reasons, it’s well worth it for your team to learn how to conduct such tests properly and ensure they are well-designed and meet their goals. To help you do this, today we are sharing our “8 Dos and 1 Don’t for Conducting IT/DR Exercises.”
The Dos for Disaster Recovery Tests
Here are the eight things we believe you and your team should do when conducting IT/DR exercises:
1. Do define the reason for the test.
Why are you conducting the exercise? What do you hope to find out? There are many good reasons for conducting an IT/DR exercise, and one test can serve multiple purposes. Here are a few things you can accomplish by performing disaster recovery tests:
- Verifying technical capability, such as whether you can bring up your servers or do a data backup.
- Verifying application functionality, such as whether you can run a certain function.
- Training staff (including secondary and tertiary people) on procedures and ensure they have the needed skill sets.
- Verifying processes and procedures.
- Satisfying an audit requirement.
- Performing maintenance on the DR environment
2. Do define the type of exercise.
There are several different kinds of IT/DR exercise. Which kind will you be conducting? Here are the main options:
- Tabletop. Review your documentation to make sure things make sense. Can help find major gaps and dependency issues.
- Smoke test. Bring up certain technologies such as servers or an application environment. Log in but stop short of performing extensive functionality testing. Also called a stand-up test.
- Application verification in isolation. Test a minimal number of applications without looking at adjacent technologies.
- Multiple application with integration testing. Look at multiple applications and verify they can function in concert with integrated technologies.
- Functional (non-IT) testing. Conduct one of the above technology disaster recovery tests while also looking at the integration of technology and business processes.
3. Do define the scope of the exercise.
Exercises can vary widely in scope. Some look to verify that you can bring up a single application (such as your inventory app or financial app). Others look at multiple apps or at the integration between apps, whether internal or external. Some exercises might look at the full lifecycle of a business process, for example, the order and inventory lifecycle, taking something from order to delivery to receipt and invoice.
4. Do identify the participants.
Based on the scope, identify which resources (servers, applications, etc.) and support personnel you need to conduct the exercise.
5. Do make sure the test environment is prepared.
In many cases, a test environment must be set up specifically for the test. It might be necessary to set up a special network, for example. The test environment ensures the exercise does not impact the production environment. (Note: Preparing the test environment is different from preparing your systems so they perform better during the exercise. That sort of preparation is discussed below, in Do #7.)
6. Do try to simulate the production environment as it would be in a real event.
Try to make the test environment mirror as closely as possible the way the environment would be set up in a real event.
7. Do determine the level of notification for the exercise.
How much notification you provide depends on your goals and priorities. The less notification you give, the more the exercise will resemble a real disruption—and the more accurate a picture it will provide of your current recovery capability. The more notification you provide, the more people will have the opportunity to fix problems before the test happens. These fixes might result in long-term improvements in resiliency. Organizations with mature IT/DR programs should be capable of performing very well on unannounced exercises. There are three basic levels of notification:
- The exercise is scheduled and publicized, and the staff prepares heavily, prechecking the environment, documentation, dependencies, and so on.
- The exercise is scheduled and publicized, but the staff do not make any special preparations to the environment, documentation, or other aspects of the recovery plan.
- The exercise is not publicized. It is sprung on the participants without advance notice.
8. Do determine exercise logistics.
Decide whether in performing the exercise the company will follow the regular crisis communication plan and procedures or whether you will use a (pre-planned) special organization. The special organization might use a defined bridge line, email list, schedule of status calls, and so on.
The Don’t for Disaster Recovery Tests
And there is one thing you should definitely not do if you wish to conduct disaster recovery tests that give real insight into and identify gaps in your recovery program. It is:
Don’t do the same exercise all of the time.
We often see organizations that get good at doing one type of exercise then perform that one exercise over and over again, while persuading themselves they must be in great shape because they perform DR tests all the time and always do well. This would be like someone who goes to the gym every day but limits themselves to lifting dumbbells with their right hand. Sure, their right bicep might be strong, but what about their aerobic capacity, flexibility, or core strength? A resilient IT/DR environment has a well-rounded fitness. This is why it’s important to not perform only one type of DR exercise. Mix it up!
Gaining Insight and Boosting Resiliency
Conducting IT/DR exercises is critical for gaining insight into the recoverability of your environment and increasing its resiliency. You might also be required to perform such exercises for audit reasons. By following the 8 Dos and 1 Don’t set out in this post, you can increase your chances of achieving an effective IT/DR test program.
For Further Reading
For more information on some of the topics mentioned above, check out these recent MHA Consulting and BCMMETRICS blog posts:
- Kill the Zombies, or How to Get More From Your DR Exercises
- 4 Metrics to Help Your Organization Improve at Crisis Management
- 7 Habits of a Good Business Continuity Manager