Save to My DOJO
VMware recently unveiled the new on-demand disaster recovery (DR) offering that will act as a SaaS solution for workload protection in the cloud. A decision driven by a trend showing the traditional DR solutions to be complex, expensive, and unreliable. In summer 2020 Vmware acquired the backup company Datrium to make use of their cost-optimized DRaaS solution combined with VMware Cloud’s consistent infrastructure and operations.
As opposed to Site Recovery, VMware Cloud Disaster Recovery will allow customers to replicate their workloads to cheap cloud storage and restore them to VMware on AWS SDDC which can be spun up on-demand to improve TCO.
The solution will support up to 1,500 VMs across multiple SDDC clusters with DR health checks triggered every half hour. There will also be the possibility to get automated audit reports to comply with internal policies.
The three pillars
VMware Cloud Disaster Recovery is introduced as an “easy-to-use cloud-based solution” based on 3 main pillars:
Most DR scenarios are convoluted and take time to execute. VMware Cloud Disaster Recovery aims to provide fast recovery of workloads by leveraging live mounts of NFS volumes in VMware Cloud on AWS and a Pilot-Light option which ensures a limited set of vSphere hosts of available for RTO sensitive workloads.
Both VMware Cloud DR and the production infrastructure can be managed from vCenter, hence maintaining skills and familiar management tools for the administrators. Managing the lifecycle of the DR software (SRM upgrades…) will no longer be an issue thanks to the SaaS management platform.
- Cloud Economics
Probably the most obvious of the 3 pillars, the economic aspect of VMware Cloud DR for a better TCO. Elastic cloud computing will allow customers to spin up costly SDDC in the cloud only when needed. Because the workloads are replicated to cloud-native storage using Datrium’s technology, you will not have to pay for a standby infrastructure waiting for a disaster to happen. Thanks to the delta based replication copy, the bandwidth will be optimized and the cost reduced to the bare minimum.
Warm DRaaS vs Hot DRaaS
VMware Cloud DR is called warm DRaaS because the VMs are replicated to cloud-native storage and don’t require an infrastructure to run all the time.
It means that the recovery times will most likely be longer than with traditional Site Recovery, however, you will save a great deal on complexity, maintenance and costs in terms of running a full-size SDDC in the cloud. However, those workloads that require aggressive RTO and RPO can still be protected by Site Recovery since you keep the infrastructure on the recovery site in a hot state, or by using Pilot-Light SDDC (more on that later).
The way VMware Cloud DR works is not that different than your average Site Recovery Manager. The same concepts of Protection groups, recovery plans and protection sites still apply.
Replication: The data is replicated from the protected site using the DRaaS connector, which must be deployed and configured on-premise. It will interact with the SaaS orchestrator in the cloud. After the connection is done, you can start working with protection groups and recovery plans in the SaaS orchestration component of VMware Cloud DR. The VMs will, in turn, be replicated to cheap cloud-native storage using delta based copy.
Recovery: To fast track the recovery the VMs can be started in VMC on AWS straight from the storage where the replicas reside using NFS mounts. VMware Cloud on AWS’s will then ensure that sufficient capacity is spun up in the SDDC to run the workloads.
- Always running: Existing SDDC in VMware Cloud on AWS.
- Just in time SDDC: Spin up a new SDDC. This option is better suited for average workloads. It will take more time and could hinder the RTO. The Pilot-Light option offers the possibility to run a low SDDC footprint to accelerate the recovery of a reduced population of the most critical workloads. Note that it can be scaled up later on during testing or recovery.
As with SRM you can trigger non-disruptive test plans and select which snapshot to recover to.
Failback: The failback process, which is delta based, allows you to bring the servers back to your production site or discard them. You can then scale down your VMC SDDC and restore it to its initial configuration or delete it.
The DRaaS Connector is an OVA appliance that must be deployed and registered with vCenter Server in order for the protected site to be set up. Note that multiple protected sites (multiple vCenters) can connect to the same SaaS orchestrator and protect different sets of VMs.
Also, multiple DRaaS connectors can be installed on the same vCenter for reliability or performance purpose.
Virtual machines are protected using name patterns, schedules, retention regardless of the storage they reside on (NFS, VVOL …). Once the protection group is configured, the schedules will run automatically and trigger snapshots at each interval. These snapshots can be displayed later on.
DR Plans (Recovery Plans)
The DR plans are easy to configure and act similarly to recovery plans in SRM, like a catalogue of site-to-site plans. As mentioned previously, they are continuously checked every 30 to ensure compliance against the source site, the destination site, and the plan’s details.
Similarly to SRM’s recovery plans, the creation of a DR Plan includes the following steps:
- Type: Failover to SDDC on VMware Cloud on AWS.
- Site, Groups: A protection site can contain multiple protection groups which define the inventory to orchestrate during a recovery.
- Resources mapping: Specifies where the recovered virtual machine must be configured (datastore, folder, port group…). Note that test bubbles are available for testing in the mapping.
- IP Address: Fine-tuning of VM configuration to accommodate for specific target environment and ensure the workload will be reachable with access to the right network resources.
- Script extension: a “script virtual machine” (Windows or Linux) associated with a particular plan adds the capacity to run custom scripts at the end of a DR plan to add checks or extra configuration.
- Recovery steps: This section sets the order of execution of the DR plan. For instance, you may want your DB servers to start before your middleware servers which themselves start before the web nodes.
- Alerts: Email alerts can be sent out to addresses to report on the status of the recovery.
We are obviously only scratching the surface of what can be done with VMware Cloud DR but it hopefully gives a good overview of where VMware is going with this. That new offering, which doesn’t require you to pay for unused and expensive compute in the cloud, will probably be the tipping point for many customers. It is a great way to get them acquainted with the cloud by starting with the protection of a few non-critical workloads and work their way up from there.
As it has been repeated since the dawn of time, keep in mind that disaster recovery is not a form of backup and should not be treated as such. Solutions like Altaro VM Backup are still among the most crucial components to ensure a safe and reliable environment in terms of data protection.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!