RTO and RPO: Making It Simple

Everyone involved in business continuity management knows that the concepts of RTO and RPO are important. Knowing exactly what they are, how they should be used, why they matter, and how to establish them may be a bit less understood. In today’s blog, we’ll lay out the facts about these key concepts in as simple a manner as possible.

 

 

Two Critical Concepts

Recovery time objective (RTO) and recovery point objective (RPO) are two concepts that everyone involved in BCM has heard of and knows (we will define them shortly). The details around what they mean or why they are important—much less how to go about establishing them throughout their organization can be confusing.

Let’s try to clear up some of the confusion.

RTO and RPO are indeed very important in developing a solid BCM program. Determining them is a prerequisite for developing sound business and technical recovery strategies.

Each key business process or technical process at the organization should have an identified RTO that must be determined through impact analysis. Associated applications or systems supporting the business process will have an RPO identified.

RTOs and RPOs pertain specifically to the requirements around recovery or resiliency needs in the event of serious events and outages. They do not come into play in the ordinary day-to-day operation of our organization.

Both concepts are measured in terms of elapsed time (hours, minutes, or sometimes days). RTO is related to time after an outage, while RPO is related to time prior to the outage. 

Knowing the RTOs and RPOs for the processes and technologies used across your organization helps you understand how you need to protect both processing and technology needs. Knowing them helps ensure that your strategies, implementation, and plans are neither overly aggressive (wasting resources) or inadequate (providing insufficient protection).

Let’s look at these key concepts one at a time.

Recovery Time Objective (RTO)

recovery time objective

The recovery time objective (RTO) is the amount of time in which, following an outage, a business process and its associated applications must be restored in order to prevent a defined amount of impact.

It represents a determination of when it is essential to get a process functional again in order to prevent significant damage to the organization. It is arrived at through a combination of analysis of the company’s overall operations and prioritization by staff.

Recovery Point Objective (RPO)

recovery point objective

The recovery point objective is somewhat trickier to understand; it is a description of a capability. That capability establishes the requirements for data protection.

The capability that RPO refers to has to do with the organization’s ability to recreate lost data. Specifically, the RPO refers to how much data could be manually recovered following the restoration of an application. This determines the appropriate data protection strategy for the underlying data in an application.

Manually recovering the data means recreating it by various methods such as reproducing it from memory, locating it in other applications or in hard copy, or contacting customers and asking them to resubmit their orders.

RPO is determined by identifying how much data, for a given application, the staff could manually recreate (not as a routine matter, but rather following a serious outage). Could the staff recreate up to two hours’ worth of data? Up to eight hours’ worth? Up to twenty-four hours’ worth? This is the question to be answered to determine the RPO for a given process or application.

RTOs vs. RPOs

Let’s look at some general facts about RTOs and RPOs.

  • The two values are independent of one another. There’s no correlation.
  • The RTO is about when the business process and its associated applications must be recovered to limit the damage to the organization.
  • The RPO is about how much data could be manually recovered, if this became necessary.
  • It is possible for a business process and its associated applications to have a short RTO and a long RPO.
  • A business process and its associated applications can also have a long RTO and a short RPO.
  • RTOs vary widely, depending on the criticality of the process to the organization.
  • With both RTO and RPO, you have to plan for the worst-case scenarios.

Devising Your Categories

Every organization must devise a scale of RTO and RPO categories for itself. It is best to limit these to around five or six categories for each objective. Having more can be a maintenance nightmare.

The following is a scale of RTOs that we have seen work well for many organizations:

   
RTO 0 Immediate/high availability
RTO 1 < 8 hours
RTO 2 < 24 hours
RTO 3 < 72 hours
RTO 4 < 5 days
RTO 5 > 5 days

 

And here is a scale of RPOs that many organizations have used successfully:

   
RTO 0 Zero data loss
RTO 1 < 4 hours of data loss
RTO 2 < 12 hours
RTO 3 < 24 hours
RTO 4 > 24 hours

 

Once a company devises its categories, each of the company’s key business processes are analyzed and placed into an RTO category and an RPO category. These designations guide the subsequent development of the company’s recovery plans and strategies.

Determining RTOs and RPOs

Beyond the general guidelines given above, how does a company go about determining the best and most correct RTOs and RPOs for its processes and applications?

The BCM office develops proposed RTOs and RPOs based on the organization’s known risks and needs. The IT team can be a good place to start by leveraging the times they use for their current protection and recovery strategies. Using those values, the BCM office can then make adjustments based on discussions with management to understand the general times departments would need to be recovered.

Making the best choices depends on factoring in information and insights commonly held across many different levels within the organization.

The final decisions regarding RTOs and RPOs should emerge after a process of data gathering and collaborative discussion. Once defined, those proposals should be submitted to upper management for review.

Throughout this process, the BCM office has the job of educating others, facilitating the discussion, seeking consensus, and obtaining the necessary approvals.

Keeping Up to Date

Every organization should review its RTOs and RPOs on a regular basis. This is because organizations and the environment change. A company that has outgrown its recovery plan has no recovery plan. It is critical that RTOs and RPOs be kept up to date.

The need for companies to regularly review and update their RTOs an RPOs has never been greater than it is now. After a year of the pandemic and working from home, very few companies today are in the same posture they were even a year ago.

The pace of change will continue to be swift as organizations chart their post-pandemic futures in the coming months.

Summing Up

Recovery time objective (RTO) and recovery point objective (RPO) are two of the fundamental concepts in business continuity.

The RTO addresses how soon after an outage a business process and its associated applications must be recovered to limit the damage to the organization. The RPO is about how much data could be manually recovered, if this became necessary.

The RTO and RPO for each key business process and its associated applications must be determined through analysis of the company’s operations and priorities. They form the basis of the organization’s recovery plans and strategies.  

With the BCM office facilitating, the RTO and RPO for each key business process and its associated applications should be determined collaboratively. They should be reviewed and updated regularly to ensure that the company’s recovery plans and strategies can truly protect it in the case of an outage.  

Further Reading

For more information on RTOs and RPOs and other hot topics in BC and IT/disaster recovery, check out these recent posts from MHA Consulting and BCMMETRICS:

About
Richard Long
Richard Long is one of MHA’s practice team leaders for Technology and Disaster Recovery related engagements. He has been responsible for the successful execution of MHA business continuity and disaster recovery engagements in industries such as Energy & Utilities, Government Services, Healthcare, Insurance, Risk Management, Travel & Entertainment, Consumer Products, and Education. Prior to joining MHA, Richard held Senior IT Director positions at PetSmart (NASDAQ: PETM) and Avnet, Inc. (NYSE: AVT) and has been a senior leader across all disciplines of IT. He has successfully led international and domestic disaster recovery, technology assessment, crisis management and risk mitigation engagements.
business continuity oversightsmanage residual risk