Save to My DOJO
In the third and final post of this blog series, we will evaluate Microsoft’s replication solutions for multi-site clusters and how to integrate basic backup/DR with them. This includes Hyper-V Replica, Azure Site Recovery, and DFS Replication. In the first part of the series, you learned about setting up failover clusters to work with DR solutions and in the last post, you learned about disk replication considerations from third-party storage vendors. The challenge with the solutions that we previously discussed is that they typically require third-party hardware or software. Let’s look at the basic technologies provided by Microsoft to reduce these upfront fixed costs.
Note: The features talked about in this article are native Microsoft features with a baseline level of functionality. Should you require over and above what is required here you should look at a third-party backup/replication product such as Altaro VM Backup.
Multi-Site Disaster Recovery with Windows Server DFS Replication (DFSR)
DFS Replication (DFSR) is a Windows Server role service that has been around for many releases. Although DFSR is built into Windows Server and is easy to configure, it is not supported for multi-site clustering. This is because the replication of files only happens when a file is closed, so it works great for file servers hosting documents. However, it is not designed to work with application workloads where the file is kept open, such as SQL databases or Hyper-V VMs. Since these file types will only close during a planned failover or unplanned crash, it is hard to keep the data consistent at both sites. This means that if your first site crashes, the data will not be available at the second site, so DFSR should not be considered as a possible solution.
Multi-Site Disaster Recovery with Hyper-V Replica
The most popular Microsoft DR solution is Hyper-V Replica which is a built-in Hyper-V feature and available to Windows Server customers at no additional cost. It copies the virtual hard disk (VHD) file of a running virtual machine from one host to a second host in a different location. This is an excellent low-cost solution to replicate your data between your primary and secondary sites and even allows you to do extended (“chained”) replication to a third location. However, it is limited in that is only replicates Hyper-V virtual machines (VMs) so it cannot be used for any other application unless they are virtualized and running inside a VM. The way it works is that any changes to the VHD file are tracked by a log file, which is copied to an offline VM/VHD in the secondary site. This also means that replication is also asynchronous, allowing copies to be sent every 30 seconds, 5 minutes or 15 minutes. While this means that there is no distance limitation between the sites, there could be some data loss if any in-memory data has not been written to the disk or if there is a crash between replication cycles.
Hyper-V Replica allows for replication between standalone Hyper-V hosts or between separate clusters, or any combination. This means that instead of stretching a single cluster across two sites, you will set up two independent clusters. This also allows for a more affordable solution by letting businesses set up a cluster in their primary site and a single host in their secondary site that will be used only for mission-critical applications. If the Hyper-V Replica is deployed on a failover cluster, a new clustered workload type is created, known as the Hyper-V Replica Broker. This basically makes the replication service highly-available, so that if a node crashes, the replication engine will failover to a different node and continue to copy logs to the secondary site, providing greater resiliency.
Another powerful feature of Hyper-V Replica is its built-in testing, allowing you to simulate both planned and unplanned failures to the secondary site. While this solution will meet the needs of most virtualized datacenters, it is also important to remember that there are no integrity checks in the data which is being copied between the VMs. This means that if a VM becomes corrupted or is infected with a virus, that same fault will be sent to its replica. For this reason, backups of the virtual machine are still a critical part of standard operating procedure. Additionally, this Altaro blog notes that Hyper-V Replica has other limitations compared to backups when it comes to retention, file space management, keeping separate copies, using multiple storage locations, replication frequency and may have a higher total cost of ownership. If you are using a multi-site DR solution which uses two clusters, then make sure that you are taking and storing backups in both sites, so that you can recover your data at either location. Also make sure that your backup provider supports clusters, CSV disks, and Hyper-V replica, however, this is now standard in the industry.
Multi-Site Disaster Recovery with Azure Site Recovery (ASR)
All of the aforementioned solutions require you to have a second datacenter, which simply is not possible for some businesses. While you could rent rack space from a cohosting facility, the economics just may not make sense. Fortunately, the Microsoft Azure public cloud can now be used as your disaster recovery site using Azure Site Recovery (ASR). This technology works with Hyper-V Replica, but instead of copying your VMs to a secondary site, you are pushing them to a nearby Microsoft datacenter. This technology still has the same limitations of Hyper-V Replica, including the replication frequency, and furthermore you do not have access to the physical infrastructure of your DR site in Azure. The replicated VM can run on the native Azure infrastructure, or you can even build a virtualized guest cluster, and replicate to this highly-available infrastructure.
While ASR is a significantly cheaper solution than maintaining your own hardware in the secondary site, it is not free. You have to pay for the service, the storage of your virtual hard disks (VHDs) in the cloud, and if you turn on any of those VMs, you will pay for standard Azure VM operating costs.
If you are using ASR, you should follow the same backup best practices as mentioned in the earlier Hyper-V Replica section. The main difference will be that you should use an Azure-native backup solution to protect your replicated VHDs in Azure, in case you switch over the Azure VMs for any extended period of time.
From reviewing this blog series, you should be equipped to make the right decisions when planning your disaster recovery solution using multi-site clustering. Start by understanding your site restrictions and from there you can plan your hardware needs and storage replication solution. There are a variety of options that have tradeoffs between a higher price with more features to cost-effective solutions using Microsoft Azure, but have limited control. Even after you have deployed this resilient infrastructure, keep in mind that there are still three main reasons why disaster recovery plans fail:
- The detection of the outage failed, so the failover to the secondary datacenter never happens.
- One component in the DR failover process does not work, which is usually due to poor or infrequent testing.
- There was no automation or some dependency on humans during the process, which failed as humans create a bottleneck and are unreliable during a disaster.
This means that whichever solution you choose, make sure that it is well tested with quick failure detection and try to eliminate all dependencies on humans! Good luck with your deployment and please post any questions that you have in the comments section of this blog.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!