Configuring Network Prioritization for Cluster Backup and Recovery

Save to My DOJO

Configuring Network Prioritization for Cluster Backup and Recovery

Table of contents

One of the most underutilized features in Windows Server Failover Clustering is the ability to configure and prioritize network traffic of Hyper-V virtual machines (VMs) and their applications.  This is mainly because this feature is hidden from the Failover Cluster Manager GUI and must be accessed using PowerShell.  However, many admins consider this a critical feature to optimize their cluster traffic and build resiliency into their infrastructure.  This blog post will give you the best practices which I learned while I was an engineer on Microsoft’s Clustering Team, focusing on how to optimize your backup and recovery traffic when you really need it.

Remember that your organization is using a failover cluster to provide high availability for your important VMs and applications.  A key tenant of clustering is that all the hardware must be redundant to avoid any single point of failure.  This means that in addition to having multiple servers (hosts) and shared redundant storage, there are also multiple networks connecting each host to the rest of the datacenter.  If any cluster network is being overused, then traffic using that network may be blocked or delayed.  If traffic cannot get through, it can cause adverse effects on the system, such as triggering a false failover.  By using network prioritization to define the importance of each type of traffic, you can eliminate bottlenecks and maximize your traffic flow.

In an ideal world, your cluster should have at least 4 networks, each with a primary role.  If any of these networks fail, then the traffic will be rerouted through a different network.  If you have fewer than 4 networks, you can combine roles, however, you generally want to separate and prioritize the following types of traffic as follows:

  1. Cluster and Cluster Shared Volumes (CSV) – The cluster network provides the critical communication path needed for the nodes to interface with each other, perform health checks, and route traffic. This network should be assigned the highest priority because if it is unavailable or nodes cannot communicate with each other, the cluster could pause, trigger a live migration, or a failover, which can incur downtime.
  2. Storage – If you are using ethernet-based connections to access your shared storage, such as SMB, iSCSI or Fibre Channel over Ethernet (FCoE), then you want to separate this network and make it your second highest priority. Since the performance of your VMs or applications may be limited by their ability to access storage, generally this network should be optimized and dedicated to only this type of traffic.
  3. Public and Applications – This network should be used exclusively to connecting the clustered workloads or VMs to their applications or users. If this network is accessible via the public Internet, then it may be subject to DOS attacks, which flood the network and block other traffic from going through.  For this reason, you should always separate this network so that any interference will not trigger any changes within your cluster.
  4. Management & Backup – The lowest priority network is usually the management network which is used support high throughput, yet infrequent traffic. This network will be used for live migration traffic to copy large amounts of memory between hosts; backup and recovery traffic to copy large amounts of critical data; patching traffic to distribute new updates to the hosts; and deployment traffic to copy OS images files, virtual hard disks (VHDs) or snapshots to VMs.  It is usually recommended to be the lowest priority since it only gets used occasionally for specific tasks, and it is a good candidate to be the backup network for all the others.

Based on your hardware, once you’ve made your determinations, you are ready to assign a priority to your networks. Follow the guidance provided in this blog by Altaro on the Hyper-V Network Prioritization and Binding Order.  At a high level, prioritization happens by assigning a value (“Metric”) to each network, with the lowest value being the highest priority.  For example, you may have assigned your 4 networks as follows:

  • Cluster Network = 1000
  • Storage Network = 2000
  • Public Network = 3000
  • Management &Backup Network = 4000

Based on these recommendations, the backup network will be considered the lowest priority network as the traffic is irregular, but does that really make sense for such a critical function?  Ensuring that every backup gets completed is important to prevent data loss and decrease your recovery time objective (RTO) and recovery point objective (RPO).  Ensuring that you can restore a backup quickly is even more important for the business, but when using the lowest priority network, it can be challenging to force this traffic through.  This should be especially worrying if there has been a catastrophic failure which causes the cluster to restart, and it will prioritize its own cluster and storage traffic.

The easiest solution is to have a separate network entirely dedicated to backups, and assigning this as the second-highest priority network.  If you have a fifth network that is easy:

  • Cluster Network = 1000
  • [NEW] Backup/Recovery Network = 1500
  • Storage Network = 2000
  • Public Network = 3000
  • Management Network = 4000

Or you are not using a network for ethernet-based storage, then reprioritize this one so the configuration would look like:

  • Cluster Network = 1000
  • [UPDATED] Backup/Recovery Network = 2000
  • Public Network = 3000
  • Management Network = 4000

However, most Hyper-V hosts have only 4 network interfaces and use Ethernet-based storage.  In this case, then you can dynamically change the prioritization of the network using a PowerShell script.  This is easy to do when you are taking a regularly scheduled backup, whether that is hourly, daily, or weekly.  A few minutes before you begin this backup task, run the script to switch the priority of your backup network so it has the second-highest priority.  Then several minutes after the backup has successfully completed, restore it to the original priority order using a second script.

Before Backup / Recovery During Backup / Recovery After Backup / Recovery
·         Cluster Network = 1000

·         Storage Network = 2000

·         Public Network = 3000

·         Management & Backup Network = 4000

·         Cluster Network = 1000

·         [UPDATED] Management & Backup Network = 1500

·         Storage Network = 2000

·         Public Network = 3000

·         Cluster Network = 1000

·         Storage Network = 2000

·         Public Network = 3000

·         [UPDATED] Management & Backup Network = 4000

If you have detected that you need to restore a backup to a cluster, then you will follow a similar workflow of dynamically adjusting the prioritization.  This is a little more challenging as it will not automatically be scheduled and must be triggered only when a recovery is actually needed.  Whether this task is run automatically (recommended) or requires manual intervention, make sure that this script has been written and tested in advance.  Testing is critical to ensure regular backups and easy recovery, so if dynamic network adjustment is part of your backup plan, make sure that you are checking the states and priorities of your cluster networks before, during, and after this process.

Even if you follow the recommendation to set your management network as your lowest priority due to the irregularity of its traffic, you now know how to quickly adjust the priority when needed.  This will give you the best chance to optimize your backups and restore them as quickly as possible when a disaster strikes.

Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment

Your email address will not be published. Required fields are marked *

Backup Bible