Hyper-V Failover Clusters – Part 1: Overview
One of the greatest benefits of virtualization comes in the form of “high availability”. This term refers to a group of technologies that allow you to keep a virtual machine online even when its intended physical host is unavailable. In Hyper-V R2, high availability (HA) is achieved by creating Failover Clusters out of separate physical computers. This can be an intimidating subject but with a little guidance you can have an operational cluster in no time. This is the first installment in a multi-part series on Hyper-v Failover Clustering. It will explain what clustering is and what you’ll need to make it work.
How the Failover Cluster Works
Failover Clustering is not a new technology for Microsoft, nor is it specific to Hyper-V. Remote desktop (formerly Terminal Services) administrators probably have the most familiarity with it, and there are a number of other applications that can be clustered. In simplistic terms, two or more computers are joined into a cluster. An application is installed to each of them. That application is then designated as a cluster resource. Other resources, such as disk storage, can also be designated as a cluster resource. From there, behavior is dependent upon the application or resource being clustered. For some, the application or resource is “owned” by one host at a time and can be manually or automatically transferred to any other host in the cluster. In other cases, the application runs on all hosts simultaneously and is balanced across the hosts. In the case of Hyper-V, it is a combination. The Hyper-V role runs on every host. Each virtual machine is placed on one host, but can be manually transferred by an administrator, automatically transferred by the Failover Cluster service in the event of an outage, or automatically moved by the Intelligent Placement in System Center Virtual Machine Manager (SCVMM).
High Availability is not Fault Tolerance
Newcomers to virtualization are often confused by the differences in this terminology. Hyper-V R2 does not have a native fault tolerance feature. You can use LiveMigration to move virtual machines from one host to another without perceptible downtime. However, if a host suffers a serious failure of some kind, such as a blue screen, its virtual machines will immediately stop. The Failover Cluster service will automatically transfer them to another host in the cluster and it will automatically start them up, but they will suffer downtime. This is still considered “high availability” because those machines are only down for the amount of time it takes the Failover Cluster service to consider them to be down and begin spinning them up in their new locations. Contrast this scenario to the traditional non-virtualized model in which the machine’s operating system and applications are offline until repairs can be made.
What You Need to Make a Hyper-V Cluster
For starters, you’ll need the proper hardware. The simple list is: two or more computers, one or more storage devices, and network switching equipment to connect it all.
- The Computers: The best practice for a cluster is to use identical computers, but it’s not a requirement. It is possible to use mixed hardware within your Hyper-V cluster. The further away the various components get from being the same, the more likely you are to encounter problems. Most importantly, Hyper-V cannot LiveMigrate between CPUs from different manufacturers, so creating a cluster that combines Intel and AMD chips is pretty much a waste of time for anything other than pure failover purposes — even then, there’s no guarantee that your virtual machines will play well with the mixed environment. Whatever hardware you use, its CPU must support native virtualization (VT on Intel and AMD-V on AMD) and that support must be enabled in the BIOS. It must have its hardware Data Execution Prevention enabled in the BIOS (XD on Intel, NX on AMD). If you’re going to use RemoteFX, your CPU will need to support second-level address translation (SLAT) and your computers will need graphics processors that can run DirectX 9 and 10. Those GPUs must also have enough dedicated memory for all of the virtual machines that will be running on them. For optimal operations, each computer will need at least four network cards. Add a minimum of one more if you’ll be connecting to your storage devices by iSCSI.
- The Storage Device(s): A central requirement of failover clustering is shared storage. This was covered in-depth in an earlier article, but the short-form is that you’ll need some storage that all the computers in your cluster can see at once. Depending on your workload and budget, this can be a computer with Windows Server and iSCSI target software installed, a low-cost NAS device, or a powerful SAN. The device you choose must support iSCSI-3 persistent reservations. Fibrechannel SANs almost always meet the requirement.
- The Network Equipment: You have to connect all that hardware, and the best way is through a switch. If you’re just going to be using an iSCSI system, a layer-2 managed switch with sufficient gigabit ports to handle all the network cards of your hardware. It is considered a best practice to completely segregate your iSCSI traffic by placing it on its own physical switches that are not connected to any others. However, this is often an impractical solution especially when there are budgetary constraints. If you can’t physically separate your iSCSI network, ensure that you are at least setting up a separate VLAN for it. Fibrechannel installations will have their own requirements, but it is usually preferable to use fibrechannel switches rather than to directly connect the hosts to the storage devices.
Once you’ve got the hardware out of the way, the next consideration is software. First, decide on the host operating system/parent partition, since that’s what gets installed first. In all cases, my recommendation is to directly install the free Hyper-V. It is a bit intimidating to work without a GUI at first, but there of guides to getting it going and keeping it functional and it eliminates any concern about ever needing to license the base installation. If you do choose to install a licensed copy of Windows directly to the hardware, you must use Enterprise or Datacenter Edition. Standard Edition does not contain the Failover Clustering feature. Also, don’t use Server Core. The one and only reason to use Server Core over native Hyper-V would be to gain access to other roles that native Hyper-V doesn’t support, and that is not a good idea. A physical host for virtual machines should not be given any other responsibility. It forces Hyper-V to share the risks associated with those other roles, and any risk to Hyper-V is a risk to all of its virtual machines. Also, Server Core does not support the usage of RemoteFX whereas native Hyper-V does.
After you’ve decided what to install, the next thing to consider is licensing. We’ve written a fairly detailed explanation of how that works for the parent partition and guest operating systems. You’ll also want to think about application software. The presence of a Hyper-V cluster can change the way you think about these things. For instance, you could purchase per-core (formerly per-CPU) licenses for Microsoft SQL Server. Unless you intend to only run your SQL instances in a single virtual machine or unless you’re willing to give up LiveMigration and only allow your SQL servers to move by failover (i.e., a host crash), you’ll need to ensure that you’ve licensed for all possible places that your virtualized SQL servers can ever be running. To be certain, it’s always best to contact a Microsoft licensing expert and discuss your particular case. Many software dealers will offer this consult at no charge.
In the next segment of this series, we’ll investigate designing a Hyper-V Cluster and how to install a Hyper-V Cluster
Backing up Hyper-V
(It’s safe, we hate spam too!)