DR 101: The Basics of Disaster Recovery

 

What is Disaster Recovery (DR)?

Disaster recovery is a way to recover from the worst outages you can imagine. Whether you are facing down earthquake, fire, tropical storm, flooding, ransomware, or even user error, the goal is to be able to recover quickly. The key is to find a way to avoid downtime and data loss by preparing your data for recovery before a disaster.

If you find yourself affected by a region prone to earthquakes or hurricanes, you will need to have a separate location that is not going to be affected by the sudden outages that can occur. In these events, you can always fail over into your secondary location, out of harm’s way. This will allow you to keep your business running and your employees working through remote (and hopefully safe) connections from their own homes.

How about a tornado? When facing winds upwards of 320+ mph, a house gets torn to shreds. Nothing is safe from being picked up, spun around, and tossed miles away from its home site. Once again, you will need a DR solution that is in a separate location –preferably not in tornado alley, to act as a secondary site for your business.

Perhaps ransomware has you in a bind. In this situation, the name of the game is sanitizing your site and then recovering from an uninfected version to get your business running again. You can do this by rolling back to an earlier version. Next, the trick is to get up and running again without re-infecting, while ensuring your business stays afloat.

The most common form of disaster, however, is user error. Maybe you patched a server before checking to see if there would be any ill effects on your server. Maybe someone deleted important data. In each of these cases, with a disaster recovery solution in place, you will be able to make that data available again. In the event of a patch issue, you can always boot the machine in question, in an isolated environment, and check how a new patch will affect your production servers.

Disaster Recovery Plan

For all situations that may result in down time, data loss, or even full on infrastructure loss, you will need a contingency plan in place. A disaster recovery plan will help your team outline steps required to get up and running and minimize impact cost to the business. For some, it is a matter of recovering files and making them available to the workforce. For others, it is getting critical servers in your infrastructure up and running at a moment’s notice as to keep the web traffic flowing and transactions coming in.

Having a laminated “Steps to Recover” plan hanging in the IT (Information Technology) department or server room with steps on who to contact, and what to do first, will go a long way with reducing downtime. But what if that is not enough?

Business Continuity versus Disaster Recovery

Business Continuity is an umbrella term that encompasses all things required for a business to keep running – not just IT – in the event of disruption. Where disaster recovery may be a specific solution for failing over your servers, business continuity is the company-wide plan, processes, and tools to avert or recover from major outages from every angle. It is a list of vendors and personnel that need to be notified (and how). Business continuity is inclusive of IT disaster recovery, when everything is preconfigured to be able to quickly recover all critical systems, remap the network, and have clients/users failover to the cloud without anyone having to do anything other than push a button to start the failover event.

For the purposes of this article, I want to focus on the DR components of business continuity where orchestration is concerned. In preparing DR run books (also called DR playbooks) we’ll start with a list of servers and create an ordered sequence in which these servers should “fail over” (or recover) in. You will also want these run books to accommodate for delays between service startups to ensure the proper services are running for any dependency servers. Finally, you will want your failover LAN segmentation to be prepared beforehand, so that you are not wasting your time configuring your new subnet, IP (Internet Protocol) range, or VPNs (Virtual Private Network) during the outage. In some disaster recovery solutions, you might get lucky and have a playbook “automated” as a programmable orchestration.

With all of this, you will want to test the full end-to-end orchestration regularly — not only will you want to ensure timing and sequencing works, but to gain the administrative “muscle memory” for doing it in a real disaster. When all is said and done, that muscle memory is a few clicks of the mouse – supported by instinct and no fear of forgetting a portal URL or the password. Finally, it is vital to be experienced with your toolset and its administrative interface, just in case recovery plans require unexpected change. For business success, business continuity and disaster recovery cannot just be plans, they must be the execution behind the plan.

Recovery Point Objective (RPO) and Recovery Time Objective (RTO)

Often overlooked items in a disaster recovery plan are the RPO and RTO. For the most effect plans, you will need to know the difference between what you must work with and what you can afford work with. Think of each backup “snapshot” as a recovery point. The Recovery Point Objective is the maximum time you can afford between snapshots – i.e. how much data loss can you manage. The more recovery points you have, the more options you have – especially against ransomware. However, the Recovery Time Objective is the maximum time desired for full recovery and restoration to be accomplished at a secondary site – when your business is back up-and-running. The RTO is generally what you must work with – in that time for boot and sequence of servers can only be parallelized so far. Thus, to have the most effective recovery plans, you will need to know, realistically, how often you can take a snapshot and how long it takes to get the secondary site functional.

Too often people assume they can create a recovery point every 15 minutes or every 30 minutes. As noted, the truth is that recovery point and recovery time depend on your underlying infrastructure. I have put together disaster recovery plans with companies that demanded 15-minute recovery time objectives on an SQL server whose resources were already fully saturated by their day to day workload. These servers had neither spare RAM nor compute resource available to add other services, let alone frequent backup processes. It was an unreasonable request because the customer refused to increase the resource allocation to the server – as it was running fine. The customer demanded that the very intensive backup job run constantly to keep up with the transaction changes. While the software was lightweight enough to do quick backups, there was insufficient resources on that server to accommodate the request.

When setting expectations for recovery point and recovery time objectives, it is important to properly understand resource allocation and to temper expectations. With better understanding in place, you will know when you can reliably get a new recover point, get it to the cloud/secondary location and have it available as an option in your greater disaster recovery plan, and eventually in your business continuity plan.

Disaster Recovery Services

Once you have your overall business continuity plan drilled down into an IT  disaster recovery plan, and have that DR plan fleshed out into your recovery point objectives and recovery time objectives, it is time to hunt for a service that appeals to your needs. At the beginning of this post, I mentioned different types of disasters. The key here is to figure out what you are susceptible to and plan accordingly. Are you in a place where you do not have to deal with many natural disasters? Does your field encounter a lot of potential ransomware schemes? How about your users and clients — are they handling your data securely? These questions are intended to probe to help you find the right service for your needs. As for DR services, there are a lot to choose from. They range from simple local backup, to offsite replication, to a combination of both, to failing over into another infrastructure. Some services provide public cloud destinations, private cloud destinations, or even vendor cloud destinations. It is important to narrow down what you need to understand what services to go with. Often, these questions are already answered for you. Do you have a governing body that your data must be managed to for privacy or compliance? Does the vendor you are looking at honestly hold regulatory compliance and attestations they claim – or is it the sales guy nodding their head to get you to bite?

If you find yourself simply needing to replicate data offsite, with little need to be fully recovered within hours, a simple replication tool may suffice. If you find yourself needing something more, perhaps a reimaging of your server, but have no need for failover, and 24 hours of downtime is acceptable, then you . For clients that need their servers up and running within 24 hours, you will need a disaster recovery solution. Preferably something that can virtualize your source bare metal or VMs on the fly.

Infrascale Resolves Disaster Recovery Challenges

Once you know the problem you are solving, the tough part becomes who to trust with your data. Infrascale has been in the business of disaster recovery since 2011. In that time, we have amassed a portfolio of products that cover every disaster recovery scenario.

Infrascale Cloud Backup provides the ability to directly replicate endpoint data to the cloud, over an encrypted tunnel, where it will land with 256-bit AES encryption at rest, in the Infrascale data center. It also offers a local backup option and ransomware detection built in. Its focus is to effortlessly replicate data off endpoints — laptops, desktops, and mobile devices, whether Windows, Mac, iOS, or Android — and store it securely in the Infrascale cloud. With robust policy management, you will be able to remotely dictate scanning, backup, and retention rules. Coupled with bandwidth throttling tools, fully-controllable reporting/alerting, and added security services, you will be able to ensure full management over your endpoint devices.

Those that have a need to protect their entire infrastructure can use Infrascale Disaster Recovery. After a brief sizing exercise, Infrascale will assign an implementation specialist who will work with you to deploy your purpose-built Cloud Failover Appliance (CFA) at your on-premises location. Unlike our competitors, Infrascale ensures that someone will be there to hold your hand for a smooth deployment. The data from your servers will aggregate onto this CFA, where it will undergo multiple levels of deduplication, culminating in customer-level, global deduplication. This global deduplication will create a global block map that we use to compare with the cloud version of your data – against its own global block map. This underlying global block map technology allows Infrascale to easily replicate data to and from the cloud, by only pushing and pulling changed blocks of data. These blocks will then, on the fly, be stitched together to create incremental images, differential images, or even synthetic full images.

These images can be mounted at a moment’s notice to recover individual files by pulling them through your browser, pushing them to other devices or making them available on a mapped network drive. They can also be booted on the local appliance or in the cloud –without a need to reach out to support for preparation of the event. Once booted, you can use a VPN (Virtual Private Network) access to reach the booted machines to keep production running.

Infrascale Cloud Application Backup, another Infrascale offering, will protect your software as a service applications — Microsoft 365 (including Exchange, Teams, OneDrive, and SharePoint), Google’s G Suite of online applications and even services like Salesforce. The goal is to create “offsite” backups from the vendor’s ecosystem to supply end customers with longer retention and granular recovery. Simply connecting to your vendor’s applications through the backup solution, Infrascale can pulled the data, encrypt it and store it for easy retrieval.

This entire suite of products is centrally managed and monitored through the Infrascale Dashboard. This portal will give access to a one-stop-shop for all your backup and recovery needs. Dashboard also connects to ConnectWise or Autotask for ease of management, ticketing and billing integration. The Infrascale Dashboard is multi-tenant, giving your clients and end users all a secure place to manage their data without getting in the way of one another or altering something when a stronger policy is put in place by their administrator.

To learn more about the Infrascale suite of backup and disaster recovery products, we encourage you to check out the links below and reach out to our team with any questions.