Disaster Recovery as a Service (DRaaS) in VMware – The Full Picture

Save to My DOJO

Disaster Recovery as a Service (DRaaS) in VMware – The Full Picture

Disaster Recovery as a Service (DRaaS) is a type of service offering that provides Disaster Recovery (DR) capabilities in the cloud. You may have read about what is disaster recovery as a service in our dedicated blog during VMworld 2020.

Traditionally, organizations have distributed critical systems across multiple sites or locations to protect against failures. This approach has been effective but expensive; buying the same hardware multiple times to stand up identical infrastructure. Disaster recovery as a service provides the orchestration and replication software required to failover services to standby or on-demand services in the cloud. In this article, we will run down the basics of DR and break down the DRaSS options available in VMware. Let’s get to it!

Using DRaaS to prepare for a Disaster

    • The main benefit of DRaaS is removing the need for dedicated additional data centers or hosting facilities, along with duplication of hardware.
    • The resources required for failover are maintained and allocated by the service provider, who will typically have a global footprint with a fully resilient setup.
    • The service provider should provide the replication and orchestration capability to restore services into the cloud.
    • Ideally, further value such as compliance checks and restore tests should also be added.
    • Standardization of recovery plans for multiple sites and removing the heavy lifting in creating a dedicated disaster recovery plan. Using cloud services also presents several generic benefits, as organizations move away from racking and stacking hardware on-premises they can benefit from:
    • Quicker time-to-market or project delivery, by freeing up staff from maintaining the underlying infrastructure.
    • Economies of scale, using cheaper commodity infrastructure or paying for on-demand consumption.
    • Shifting from large Capital Expenditure to predictable, reoccurring Operating Expenditure funding for IT.

What is RPO and RTO?

Any kind of disaster recovery needs to be measured with Service Level Agreements (SLAs) along with Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs).

    • Recovery Point Objective (RPO): The data or application state at a particular point in time, for which recovery is provided. For example, an RPO for a system with critical and changing stateful data may need to be a few minutes, a system where not much change takes place could be 4 hours, or a non-critical system that can incur data loss could be 1 day or even longer.
    • Recovery Time Objective (RTO): Amount of time taken to recover. Again, a mission-critical system that cannot afford downtime may need a low RTO, whereas a test system may allow for an RTO of days or even weeks before it is available again.

There will be multiple RPO and RTO values for the different services within each organization. There is typically a trade-off between cost and recovery time. When examining on-premise DR or disaster recovery as a service, the RPO and RTO offerings should be in alignment with the business needs.

How Does Disaster Recovery as a Service Work?

As a typical service model, Disaster Recovery as a Service will replicate (and convert if required) physical or virtual servers to cloud-hosted infrastructure. In the event of a disaster or event that impacts the uptime of the on-premises service, failover to the cloud-based copy is initiated to maintain business continuity.

Example disaster recovery as a service High-Level Setup

Example disaster recovery as a service High-Level Setup

The low-level details vary depending on the exact service and provider. The most common hypervisor for server virtualization in the data center is VMware. Let’s look at some of the Disaster Recovery as a Service options available for VMware workloads.

VMware Cloud Disaster Recovery (VCDR)

VMware Cloud Disaster Recovery (VCDR) is perhaps managed DRaaS in the truest form, it works as follows:

    • The DRaaS Connector VM snapshots workloads from the on-premises VMware environment into a cloud-based scale-out file system.
    • The customer pays for the number of VMs they are protecting, and the total amount of storage they have used.
    • The Software-as-a-Service (SaaS) orchestrator and control plane allows the customer to specify exactly how many recovery points they would like to retain, at what frequency, and for how long.
    • Should DR need to be invoked, the scale-out file system is mounted to dedicated VMware Cloud on AWS nodes, and the workloads powered on.
    • The recovery nodes can either be already running or deployed automatically on-demand.
    • When the protected site or hardware is available again, a delta-based failback can be scheduled.

You’ll notice that other than policy customizations, the provider is managing all the failover infrastructure. At the time of writing VCDR is only available for AWS, with VMware Cloud on AWS as the recovery site. In the future, this will be extended out to other VMware-based disaster recovery as a service providers, such as Microsoft Azure (Azure VMware Solution) and Google Cloud (Google Cloud VMware Engine).

VMware Cloud Disaster Recovery Setup

VMware Cloud Disaster Recovery Setup

As well as the typical disasters that spring to mind such as power outages, natural disasters, hardware failures, or human error, VCDR is great for ransomware protection.

The normal VM-based replication model for disaster recovery will replicate any kind of corruption or encryption installed by ransomware, rendering the replicas useless. Furthermore, ransomware will most likely seek out backups as the first point of attack.

VCDR uses the following methods to provide ransomware recovery:

    • User-defined snapshot frequency and retention points, for a deep history of data and application state.
    • Immutable backups/snapshots that cannot be changed.
    • Instant VM power-on for faster experimentation.

VMware Cloud Disaster Recovery was first announced at VMworld 2020, following the company’s acquisition of Datrium. Functionality is likely to grow at a fast pace, and at VMworld 2021 the following new features were announced:

    • 30-minute RPOs, for critical applications with higher change rates, providing a restore point every 30 minutes.
    • File-level recovery, accelerate ransomware recovery by restoring files or folders without powering on the VM.
    • Integrated VMware Cloud on AWS protection, enabling region or site failover.

VMware Site Recovery Manager (SRM)

VMware Site Recovery Manager (SRM) has long been used by VI admins on-premises to provide VM failover between sites. It utilizes vSphere Replication, which is included in vCenter licensing, to replicate VMs between sites with corresponding vCenter instances.

Site Recovery Manager can run custom scripts, re-IP virtual machines, check dependencies, and run failover tests. The big difference between Site Recovery Manager and a solution like disaster recovery as a service, is that SRM requires the recovery site to be online and available, to replicate the VM data and host the placeholder VMs ready for recovery.

Site Recovery Manager can be used the same way it was on-premises, to restore into the cloud by installing SRM at both sites. In this type of setup, the customer is responsible for the SRM installation and configuration at both sites, with the cloud provider maintaining the underlying infrastructure.

Whilst this may not be fully managed disaster recovery as a service, and could perhaps be described as self-service DRaaS, it does provide flexibility and use cases for:

    • Recovery to Microsoft Azure (Azure VMware Solution)
    • Recovery to Google Cloud (Google Cloud VMware Engine)
    • Recovery to Oracle Cloud (Oracle Cloud VMware Engine)
    • Recovery to other VMware Cloud Provider Partners, such as IBM Cloud
    • Recovery to managed service providers, and private, local, or sovereign clouds

VMware Site Recovery Manager Setup

VMware Site Recovery Manager Setup

VMware Site Recovery

VMware Site Recovery provides the same functionality and benefits as SRM, except that the solution is provided in Software-as-a-Service (SaaS) form.

Currently, VMware Site Recovery is only available with VMware Cloud on AWS, or a hybrid site pairing between on-premises and VMware Cloud on AWS. These disaster recovery models could be termed as being on the spectrum between assisted DRaaS and self-service DRaaS.

In the former example, VMware Site Recovery is enabled from the VMware Cloud console without any installation. VM level failover can be configured between VMware Cloud on AWS Availability Zones or regions.

In the latter example, the customer installs SRM on-premises and enables VMware Site Recovery for the VMware Cloud on AWS recovery site. The hybrid setup works as follows:

    • The customer installs Site Recovery Manager at their primary or protected site. The customer is responsible for maintaining this side of the Site Recovery installation.
    • The customer enables Site Recovery from the VMware Cloud Services Portal (CSP) for the recovery site. Site Recovery for the recovery site is SaaS, so no further installation is needed.
    • A site pairing is created between the on-premises site and the VMware Cloud on AWS instance.
    • Protection policies and recovery plans are created to define the VMs in scope for failover, and any mappings or dependencies.

VMware Site Recovery Setup

VMware Site Recovery Setup

Is Disaster Recovery as a Service Right for You?

As businesses responded digitally to the Covid-19 pandemic, cloud computing has accelerated and DRaaS is no exception. Organizations starting out in the cloud initially host test and development workloads, with many opting to add disaster recovery as a use case with a cloud-based second or third site. As a result of demand and an increase in recovery scenarios, such as ransomware, the Disaster Recovery as a Service market continues to grow.

Right now, VMware Cloud Disaster Recovery and VMware Site Recovery Manager are the mainstream options. The best method for disaster recovery will be down to each individual organization, and there are plenty of alternatives:

    • Azure JetStream will backup VMs to blob storage and restore them into Azure VMware Solution (AVS).
    • Azure Site Recovery (ASR) converts VMware VMs to Azure VMs for disaster recovery.
    • AWS CloudEndure replicates physical or virtual machines into low-cost EC2 staging instances and converts them to production in the event of a failover.
    • VMware vCloud Availability is a tool for service providers and VMware Cloud Provider Partners, enabling multi-tenant recovery between sites.
    • VMware vSphere Replication on its own provides an asynchronous replication engine for VMs, which could be complemented with third-party software.
    • Backup-as-a-Service (BaaS) provides data backup to the cloud, and whilst it doesn’t provide restoration of infrastructure services, it could be worked into a disaster recovery plan with other solutions.

Making a Choice

In summary, understanding your requirements and existing services is the first step to identifying whether Disaster Recovery as a Service is the right option for you, and how you can use DRaaS to prepare and protect against a disaster. The next step is to understand the available options and align them with your own SLAs, RPOs, and RTOs, as well as any dependencies and regulatory requirements.

Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment

Your email address will not be published. Required fields are marked *