The BIA is always a hot topic in business continuity. Everybody wants to know how to do a Business Impact Analysis (BIA).
This interest is not misplaced, since the BIA is a critical part of an organization’s business continuity program.
Here’s our Business Impact Analysis (BIA) definition: A BIA provides you with a clear picture of the criticality of your business operations based on the processes they perform, and helps you identify the dependencies (i.e., the computer systems, vital records, etc.) that must be in place for those processes to run. In essence, it serves as the foundation of any good continuity strategy. Once you understand which business processes are most critical to the livelihood of your company, you can then use this information to build an effective strategy that addresses only those areas that need to be recovered and the designated time frame in which to recover them.
However, it is important to remember that the BIA is a waste of time if the organization neglects to use the results to correctly define and establish Recovery Time Objectives (RTOs).
If this is not done, the rest of your program will be seriously handicapped.
In today’s post, we’ll try to shed a little light on this often oversimplified but extremely important topic. Specifically, we will:
- Remind you of what the recovery time objective (RTO) is.
- Explain why it’s important to get your RTOs right.
- Share some tips on how to establish your organization’s RTO categories and assign your processes to the appropriate categories.
- Look at some common mistakes organizations make in setting and assigning their RTOs.
What is a Recovery Time Objective?
Recovery Time Objective (RTO) is the time in which a business process and its associated applications must be functional again after an outage event in order to prevent a defined amount of impact. In other words, RTO refers to the time it takes for functional restoration of a business process, close to where it was before the disruption.
RTOs are typically units of time such as 4 hours after the event, 12 hours after the event, 24 hours after the event, and so on. For less time-sensitive processes, they might be days or weeks after the event. The most critical and time-sensitive processes might have an RTO of 0 hours. For processes with an RTO of 0 hours, systems must be in place to ensure they never go down. An important note on RTO: from a disaster recovery perspective, the RTO clock starts when the recovery starts (or is approved), not from the start of the event.
Typically, an organization establishes around five RTO categories, some short and some long. Processes, and their associated systems and applications, are placed in appropriate categories according to the process criticality determined in the BIA. Remember, we use the BIA to determine the impacts (both quantitative and qualitative) to each process over time in order to establish how soon each needs to be restored to avoid serious impacts to the organization.
The importance of the RTOs is that they are the foundation of the rest of the recovery plan. They guide the development of your recovery strategies and technical implementation.
The RTOs are the goal; the recovery plan is the series of steps needed to meet the goal. Trying to write a recovery plan without having a clear picture of the RTOs is like planning a trip without having chosen a destination.
Why It’s Important to Get the Recovery Time Objectives Right
The main reasons it is important to get the RTOs right are simply stated:
- If you make the RTO for a process shorter than necessary, you will spend more time (and money) than required to protect the organization from disruptions to that process.
- The significant expense needed to meet an unnecessarily short RTO may also cause senior management to shy away from investing in further recovery planning.
- If you make the RTO too long, the disruption might cause serious impacts to the organization because even though the process is recovered within the time window stated in the recovery plan, it is not recovered as soon as it truly needs to be.
For every environment or application in the recovery plan, it is vital to make an informed judgment regarding how long that process can be interrupted before it causes significant impacts to the organization.
Tips on Setting and Assigning RTOs
Considering how important it is to set and assign RTOs properly, how do you go about doing this?
Defining the RTO categories prior to the BIA is just as important as performing the BIA to determine the correct RTO for a process. Without the appropriate RTO categories, the processes may not have a realistic recovery time.
The best way to determine the RTO for a process is to make sure that this task is included as part of your BIA. Estimating the dollar and non-dollar impacts of a disruption helps you arrive at a realistic and objective RTO. But even without a complete BIA, you can still determine RTOs through a less formal process.
Here are a few tips:
Defining the RTO
- The key to coming up with the appropriate categories (and placing your processes in them) is understanding at a high level what your critical processes are and what the impacts will be if they are interrupted.
- For most organizations, a good number of RTO categories is between 4 and 6.
- Determine general impact guidelines. If there really is no process that needs to be available in less than 4 hours, do not have the category; less than 8 or 24 may be your first RTO category.
- Many companies go with an RTO category (or timeframe) breakdown similar to this:
- Highly available
- Less than 8 hours
- Less than 12 hours
- Less than 24 hours
- Less than 5 days
- More than 5 days
- No universal recommendation can be made on the distribution of categories to use because this varies widely by company.
- In talking about RTOs, critical should be understood to mean “time-sensitive.” There are many processes that are critical to the organization in the long term, but which are not especially time sensitive. Accounts payable is an example.
Assigning the Recovery Time Objective (during a BIA)
- Look at your processes one by one and determine – for each process – at what point an interruption will cause a serious impact to the company. You may see a serious impact in as little as 4 hours, or you might not see one for several days.
- The shorter the RTO, the greater the expense will be to meet it.
- To assess the impact that may be caused by the interruption of a process, you should look at both dollar-based (quantitative) impacts (e.g., revenues) and non-dollar (qualitative) impacts (e.g., customer service, employee safety).
- The relative criticality of the same function is often different depending on the type of business. For example, the impact of losing the company’s external website is likely greater at an e-retailer than at a manufacturing company.
- Organizations’ brands and priorities should guide how they determine RTOs. Some organizations are more focused on customer service and employee safety, while others put more emphasis on revenue or manufacturing capability.
- Distributing your processes among RTO categories is an exercise in prioritization (again, a good BIA will do this for you).
- In most organizations, the majority of processes are not that time sensitive – typically only 25 percent of processes need to be recovered within 24 hours.
- A process should only be defined as an RTO of 0 of its disruption would cause a high impact across multiple areas, such as to revenue, customer service, and safety.
Common Mistakes in Working with Recovery Time Objectives
The following are some of the most common mistakes organizations make in establishing RTO categories and assigning processes to them:
- They make too many categories. As stated above, 4 to 6 categories is about right for most companies. If your system gets too granular, it becomes very difficult to implement.
- Organizations overestimate the time criticality of processes. This creates expense for the company for no good reason.
- People confuse time sensitivity in RTO terms with criticality to the organization overall. RTO is strictly about impacts to the organization caused by time-limited interruptions. Just because something is strategically important to the company doesn’t mean it needs a low RTO.
- People mix up resiliency and recovery. RTOs are about process recovery, and by extension disaster recovery and emergency event recovery. They’re not about day-to-day resiliency.
Those are some of the main things to keep in mind when establishing RTO categories and assigning your processes to them. Once you’ve done that, you’re well-positioned to write recovery plans that provide meaningful protection to your organization.