Disaster Recovery Orchestration

Disaster Recovery Orchestration | The Importance of Orchestration

Many have heard the adage that in this world nothing can be said to be certain, except death and taxes. However, in 2020, we may want to also include cyber-attacks and other disasters that jeopardize your data. The global pandemic has seen a huge rise in people working from home, shopping online, and being more digitally connected than ever. Unfortunately, this presents an ideal opportunity for bad actors seeking to create havoc. 68% of business leaders feel their cybersecurity risks are increasing, and on average only 5% of companies feel they are protected. It also presents more complexity for IT — managing more remote workers and equipment — on top of their data center and business operations.

Uptime is an operational imperative — as we know that its inverse, downtime, has enormous costs and impacts on business. Thus, any form of downtime — from an Exchange crash, to a site-wide disaster (tornado, hurricane, flood), to a ransomware infection — can cost an organization dearly in terms of lost revenues and productivity.

However, if: a) you have implemented a DR solution, b) that solution has orchestration capabilities, and c) failover testing has shown that the DR solution and its orchestration are at optimal performance, then an organization can dramatically reduce the amount of downtime and stress associated with these incidents.

When examining potential DR solution providers, it is increasingly important to find objective measures to separate the contenders from the pretenders. One of the key differentiators is how a solution provider delivers orchestration — the orderly recovery of a server environment during an outage. Orchestration ensures that critical servers, applications, and their dependencies come online in an automated fashion, without incident. When looking at a vendor’s failover and failback features, pay special attention to orchestration and how much customization and control you have in the orchestration process. These features can save time, save energy, and bring your critical data and applications back online with minimal loss to your business.

To start this examination, we’ll start with a maxim: an ounce of prevention is worth a pound of cure. As related to downtime, when disaster strikes or critical systems crash, IT administrators have to be thoughtful about how — and in what order — they restore applications. This needs to be pre-planned. The order of operations is crucial for seamless system restoration. For example, if your environment utilizes a dynamic host configuration protocol (DHCP) server to manage leases on your machines, this server would be among the first applications to be brought online, because of the importance of assigning IP addresses and providing configuration information. You may also want your Active Directory (AD) server to come online shortly thereafter, if not concurrently, to automate network management of user data, security, and distributed resources.

After you resuscitate these core systems you will want to restore your production workloads such as SQL Servers, Exchange, and other mission-critical apps. Then, you can boot your secondary applications. Order clearly matters, and orchestration of the sequencing is the means, by which DR solutions restore applications in a predetermined order.

Not all vendors treat orchestration equally; you have to uncover if — and how — DR solution vendors can deliver this functionality. There are four core ingredients and components of orchestration:

  • Runbooks: Most cloud recovery providers offer a simple DR runbook that presets the order, in which your systems (VMs) recover. The runbook defines a group of machines that are powered on (simultaneously) with a single command. The real power of orchestration, however, is the ability to determine the actual order (not just a group of applications that boot simultaneously). This is where scripting comes into play.
  • Scripting: To complement a runbook, IT can create simple, customized scripts (basic commands) that execute more complex configurations. This includes everything required to fully automate recovery. For example, scripts can be used to ensure that machines without DHCP servers can be rebooted with their proper network configuration, such as IP and MAC addresses.
  • Testing: Another key component of orchestration is the ability to test the failover process and ensure the runbook and scripts work as expected. Unfortunately, many DR vendors charge for DR tests or require formal disaster declarations to perform these tests. Increasingly, IT administrators are looking for a self-service failover solution that puts the control back in their hands. You’ll want to test your orchestration periodically after the initial setup, system variables continuously change (for example, when you deploy new service packs), it’s not a one-and-done activity.
  • Failback: After your production servers are running, IT is freed up to rebuild your hardware in anticipation of application failback. Once the hardware has been properly configured (post disaster), then it’s time to restore applications and their operating systems. If it’s a physical machine, then you can use a USB drive or disk to recover from a pre-installation (PE) environment. If it’s a VM, you can simply push the guest back to its corresponding host. All of this can be done while capturing any changes made by the users while working with the ‘booted’ image (during the outage).

At Infrascale, we’ve invested in orchestration to be the easiest and most customizable DR solution on the market. We have enabled runbooks to boot specific VMs and groups of VMs. We’ve built this in a way of a simple drag-and-drop interface that lets you build out your orchestration sequencing.

 

Example Orchestration Scenario

 

Example Orchestration Scenario – Boot Group & Wait 

 

Example Orchestration Scenario – Server Actions

 

For further information, see orchestration overview and details.

Beyond the runbook and boot sequencing, Infrascale offers unlimited, on-demand testing so you can freely test and retest your orchestration… on your time, and for no extra fee! We even offer a guided disaster recovery service to help you ensure your plan is ready to go!

Because of the features and flexibility provided by Infrascale Disaster Recovery, our clients get very excited about the power of our solution. One such client is required to maintain a constant uptime because they are a military contractor. Unlike most companies working with the government, their product is so vital that they are not able to shut their systems down – even for testing. And testing is something that they are also required by their regulatory compliance to do! Prior to working with Infrascale this company was forced to take backups of their entire infrastructure, fly the entire team out to their colocation in the middle of the country and test – all within a two-week window. Not only was this costly, but due to the sensitivity of the data and time constraints, it was difficult. When Infrascale came along, we were able to assist this client in completely pre-configuring failover in the cloud so they could fit into the strict time constraints, they were able to do it all securely and safely. Now, instead of dreading the failover test every year, they treat it as a vacation. What normally took weeks to complete now takes just a few hours!

Punchline: As you give disaster recovery solutions (DR) a closer look, you must ask any prospective vendor how they manage the orchestration process. Ask if you can take the offer to go beyond simple DR runbooks and help to create a comprehensive business continuity and DR playbook. When orchestration is well planned, coordinated, and tested, it can dramatically reduce the amount of downtime for any type of micro- or macro-disaster. And… just as important, it will have a positive impact on your stress level, by giving you the confidence of knowing that you can recover from anything thrown your way.