An IT disaster can cripple any business. In part one of this two-part series, we look at the steps to take to develop your disaster recovery plan.
As an IT manager, planning for a disaster is an essential part of the job.
When applications unexpectedly go offline, this hiccup can have a direct impact on your ongoing business operations and your bottom line. In some extreme cases, the data and monetary losses from unplanned outages can even cause a company to go out of business.
According to Gartner, the average cost of IT downtime is $5,600 per minute. Because there are so many differences in how organizations operate, downtime — at the low end — can reach as high as $540,000 per hour and averages $300,000 per hour.
An ITIC survey reported that 98 percent of organizations say a single hour of downtime costs over $100,000. Also, 81 percent of respondents indicated 60 minutes of downtime costs their organization over $300,000. Thirty-three percent of those enterprises reported that one hour of downtime costs their firms from $1 million to $5 million.
Establishing a Plan
You established you need to plan for potential disasters that might affect your IT operations, and you need a disaster recovery plan (DRP). Architecting this plan can be daunting. However, if you follow the process outlined in this blog, you can be confident the plan will come together smoothly and will align with your organization’s needs.
I’ll walk you through some cloud-optimized approaches to building a solid disaster recovery architecture using common blueprints. Specifically, we’ll look at AWS-specific technologies throughout. However, the principles apply to any cloud provider.
Imagine that your boss approaches you and says, “Make sure you back everything up,” and then go on his or her merry way, assuming all will be fine when disaster strikes. In that case, what do you do? Would you back up your systems and databases daily to an S3 bucket and cross your fingers? That approach might work for some organizations, but definitely not for an online banking application or a large retail credit card processing system. There are many different types of applications, with different backup and recovery requirements, to sustain each unique organization, and the situation isn’t one-size-fits-all.
A disaster recovery plan that works for one organization may not work for another. Your DRP should be part of a much larger business continuity plan (BCP). A BCP is unique to your organization, and you should create this before building a disaster recovery plan. If you work in government, we refer to this plan as a continuity of operations plan (COOP).
Business Continuity Planning
Business continuity planning is the process involved in creating a system of prevention and recovery from potential threats to a company. The plan ensures you protect personnel and assets and can function quickly in the event of a disaster. Generally, you should conceive the BCP, which covers much more than the IT assets, in advance and involve input from key stakeholders and personnel.
BCP involves defining any and all risks that can affect the company’s operations, making it an important part of the organization’s risk management strategy. Risks may include natural disasters, such as floods and other weather-related events, and cyber-attacks. Once you identify the risks, the plan should also include:
- Understanding how those risks will affect operations
- Implementing safeguards and procedures to mitigate the risks
- Testing procedures to ensure they work
- Periodically reviewing the process to ensure that it is up to date
BCPs are an important part of any organization. Threats and disruptions mean a loss of revenue or worse, extinction.
As part of a larger BCP, you must protect IT assets and your organization’s data and make these recoverable, quickly enough and with as little data loss as to allow the organization to survive and not suffer significant financial loss. This is where we define and implement a Disaster Recovery Plan that meets the goals of the Business Continuity Plan.
Disaster Recovery Plan
You should develop an information technology disaster recovery plan (IT DRP) in conjunction with your business continuity plan. Lay out priorities and recovery time objectives for information technology during the business impact analysis. And create technology recovery strategies to restore hardware, applications and data in time to meet the needs of the organization.
Recovery point objective (RPO) and recovery time objective (RTO) are two of the most important factors driving an organization’s disaster recovery or data protection plan.
The RPO and RTO, along with a business impact analysis, provide the foundation for identifying viable strategies for inclusion in the business continuity plan. Viable strategy options include any plans which enable the resumption of a business process in a time frame at or near the RPO and RTO.
RPO: Recovery Point Objective
The recovery point objective limits how far to roll back in time, and it defines the maximum allowable amount of lost data measured in time from a failure occurrence to the last valid backup. RPO designates the variable amount of data you will lose or you will need to re-enter during network downtime.
For example, if the last available good copy of data upon an outage is from 18 hours ago, and the RPO for this business is 20 hours, then we are still within the parameters of the business continuity plan’s RPO. It answers the question: “Up to what point in time could the business process’s recovery proceed tolerably given the volume of data lost during that interval?”
RTO: Recovery Time Objective
We relate recovery time objective to downtime. It represents how long it takes to restore after the incident until normal operations are available to users. In other words, the RTO is the answer to the question: “How much time did it take to recover after the identification of business process disruption?”
As an example, assume the BCP states an RTO of 12 hours. If a disaster strikes at 2 a.m., we have until 2 p.m. to restore service. If it takes two hours to restore all critical services, we must ensure the restoration process begins before 12 p.m. to meet the 12-hour RTO.
With a business continuity plan in place driving the need for a disaster recovery plan, you now have the imperative to build your backup and recovery procedures based on your organization’s defined RTO and RPO for each business application and the data used by the organization. Start this process by building a business impact analysis.
Building a Business Impact Analysis
First, you need to understand the criticality of each application, supporting hardware, server and PC, along with the required network infrastructure for each, in case of an unplanned disruption. I recommend building a high-level summary of your critical applications, servers, and functions. This step will assist you in evaluating recovery options for each application, based upon the impact on your organization if any of these applications are not available for a specific period of time.
The objectives of this BIA are to:
- Identify key organization and revenue drivers
- Identify RTO and RPO for essential business functions and processes
- Identify the impact if critical business functions cannot operate due to an unplanned disruption
- Quantify monetary and workflow impacts if critical business functions cannot operate
- Identify intangible impacts if critical business functions cannot operate
- Identify high-level minimum acceptable recovery configurations (MARC) and resources required to support critical business functions
- Identify existing continuity documentation and incident response plans — review current business continuity preparedness
- Identify internal and external dependencies such as technology, telecommunications, records and service organizations
- Recommend strategic and tactical steps required to minimize the impact of an interruption on critical business functions
Priority Applications and Servers
List the priority applications and servers that drive your organization. Define the most critical periods when you require these applications and assess the overall impact on your company if you cannot perform these functions. Ensure you define the two key metrics — RTO and RPO — for each application.
See the graphic below as an example:
In part one of this blog, I outlined the prerequisites to building any good disaster recovery plan. The disaster recovery plan must meet the business recovery requirements, which you define as part of the business continuity planning process. Without this plan, you are throwing darts at a dartboard, not truly knowing what data and systems need to be recoverable at what recovery point objective to sustain the business and limit financial risk to the organization.
In part two of the series, I’ll walk you through four common disaster recovery blueprints you can use to align your disaster recovery plan to the business requirements defined in your business continuity plan. Read part two