Q: Which Hyper-V Live Migration Performance Option should I choose?
A: If you have RDMA-capable physical adapters, choose SMB. Otherwise, choose Compression.
You’ve probably seen this page of your Hyper-V hosts’ Settings dialog box at some point:
The field descriptions go into some detail, but they don’t tell the entire story.
Live Migration Transport Option: TCP/IP
In this case, “TCP/IP” basically means: “don’t use compression or SMB”. Prior to 2012, this was the only mode available. A host opens up a channel to the target system on TCP port 6600 and shoots the data over as quickly as possible.
Live Migration Transport Option: Compression
Introduced in 2012, the Compression method mostly explains itself. The hosts still use port 6600, but the sender compresses the data prior to transmission. This technique has three things going for it:
The vast bulk of a Live Migration involves moving the virtual machine’s memory contents. Memory contents tend to compress quite readily, resulting in a substantially reduced payload size over the TCP/IP method
For most computing systems, the CPU cycles involved in compression are faster and cheaper than the computations needed to break down, transmit, and re-assemble multi-channel TCP/IP traffic
Works for any environment
Each virtual machine that you move simultaneously (limited by host settings) will get its own unique TCP channel. That gives the Dynamic and Hash load balancing algorithms an opportunity to use different physical pathways for simultaneous migrations
Live Migration Transport Option: SMB
Also new with 2012, the SMB transport method leverages the new capabilities of version 3 (and higher) of the SMB protocol. Two things matter for Live Migration:
SMB Direct: Leveraging RDMA-capable hardware, packets transmitted by SMB Direct move so quickly you’d almost think they arrived before they left. If you haven’t had a chance to see RDMA in action, you’re missing out. Unfortunately, you can’t get RDMA on the cheap.
SMB Multichannel: When multiple logical paths are available (as in, a single host with different IP addresses, preferably on different networks), SMB can break up traffic into multiple streams and utilize all available routes
Why Should I Favor Compression over SMB?
The SMB method sounds really good, right? Even if you can’t use SMB Direct, you get something from SMB multichannel, right? Well… no… not much. Processing of TCP/IP packets and Ethernet frames has always been intensive at scale. Ordinarily, our server computers don’t move much data so we don’t see it. However, a Live Migration pushes lots of data. Even keeping a single Ethernet stream intact and in order can cause a burden on your networking hardware. Breaking it up into multiple pieces and re-assembling everything in the correct order across multiple channels can pose a nightmare scenario. However, SMB Direct can offload enough of the basic network processing to nearly trivialize the effort. Without that aid, Compression will be faster for most people.
Should I Ever Prefer the Plain TCP/IP Method?
I have not personally encountered a scenario in which I would prefer TCP/IP over the other choices. However, it does cause the least amount of host load. If your hosts have very high normal CPU usage and you want Live Migrations to occur as discreetly as possible, choose TCP/IP. You may add in a QoS layer to tone it down further.
Don’t agree with my assessment or encountered situations which don’t line up with my advice? I’m happy to hear your thoughts on Live Migrations Performance Options and which are the correct solutions to choose in various circumstances. Write to me in the comments below.
If you dig on TechNet a bit, you can find an article outlining how to architect networks for a 2008 R2 Hyper-V cluster. While it was perfect for its time, we have new technologies that make its advice obsolete. I have two reasons for bringing it up:
Some people still follow those guidelines on new builds — worse, they recommend it to others
Even though we no longer follow that implementation practice, we still need to solve the same fundamental problems
We changed practices because we gained new tools to address our cluster networking problems.
What Do Cluster Networks Need to Accomplish for Hyper-V?
Our root problem has never changed: we need to ensure that we always have enough available bandwidth to prevent choking out any of our services or inter-node traffic. In 2008 R2, we could only do that by using multiple physical network adapters and designating traffic types to individual pathways. Note: It was possible to use third-party teaming software to overcome some of that challenge, but that was never supported and introduced other problems.
Starting from our basic problem, we next need to determine how to delineate those various traffic types. That original article did some of that work. We can immediately identify what appears to be four types of traffic:
Management (communications with hosts outside the cluster, ex: inbound RDP connections)
Standard inter-node cluster communications (ex: heartbeat, cluster resource status updates)
Cluster Shared Volume traffic
However, it turns out that some clumsy wording caused confusion. Cluster communication traffic and Cluster Shared Volume traffic are exactly the same thing. That reduces our needs to three types of cluster traffic.
What About Virtual Machine Traffic?
You might have noticed that I didn’t say anything about virtual machine traffic above. Same would be true if you were working up a different kind of cluster, such as SQL. I certainly understand the importance of that traffic; in my mind, service traffic prioritizes above all cluster traffic. Understand one thing: service traffic for external clients is not clustered. So, your cluster of Hyper-V nodes might provide high availability services for virtual machine vmabc, but all of vmabc‘s network traffic will only use its owning node’s physical network resources. So, you will not architect any cluster networks to process virtual machine traffic.
As for preventing cluster traffic from squelching virtual machine traffic, we’ll revisit that in an upcoming section.
Fundamental Terminology and Concepts
These discussions often go awry over a misunderstanding of basic concepts.
Cluster Name Object: A Microsoft Failover Cluster has its own identity separate from its member nodes known as a Cluster Name Object (CNO). The CNO uses a computer name, appears in Active Directory, has an IP, and registers in DNS. Some clusters, such as SQL, may use multiple CNOs. A CNO must have an IP address on a cluster network.
Cluster Network: A Microsoft Failover Cluster scans its nodes and automatically creates “cluster networks” based on the discovered physical and IP topology. Each cluster network constitutes a discrete communications pathway between cluster nodes.
Management network: A cluster network that allows inbound traffic meant for the member host nodes and typically used as their default outbound network to communicate with any system outside the cluster (e.g. RDP connections, backup, Windows Update). The management network hosts the cluster’s primary cluster name object. Typically, you would not expose any externally-accessible services via the management network.
Access Point (or Cluster Access Point): The IP address that belongs to a CNO.
Roles: The name used by Failover Cluster Management for the entities it protects (e.g. a virtual machine, a SQL instance). I generally refer to them as services.
Partitioned: A status that the cluster will give to any network on which one or more nodes does not have a presence or cannot be reached.
SMB: ALL communications native to failover clustering use Microsoft’s Server Message Block (SMB) protocol. With the introduction of version 3 in Windows Server 2012, that now includes innate multi-channel capabilities (and more!)
Are Microsoft Failover Clusters Active/Active or Active/Passive?
Microsoft Failover Clusters are active/passive. Every node can run services at the same time as the other nodes, but no single service can be hosted by multiple nodes. In this usage, “service” does not mean those items that you see in the Services Control Panel applet. It refers to what the cluster calls “roles” (see above). Only one node will ever host any given role or CNO at any given time.
How Does Microsoft Failover Clustering Identify a Network?
The cluster decides what constitutes a network; your build guides it, but you do not have any direct input. Any time the cluster’s network topology changes, the cluster service re-evaluates.
First, the cluster scans a node for logical network adapters that have IP addresses. That might be a physical network adapter, a team’s logical adapter, or a Hyper-V virtual network adapter assigned to the management operating system. It does not see any virtual NICs assigned to virtual machines.
For each discovered adapter and IP combination on that node, it builds a list of networks from the subnet masks. For instance, if it finds an adapter with an IP of 192.168.10.20 and a subnet mask of 255.255.255.0, then it creates a 192.168.10.0/24 network.
The cluster then continues through all of the other nodes, following the same process.
Be aware that every node does not need to have a presence in a given network in order for failover clustering to identify it; however, the cluster will mark such networks as partitioned.
What Happens if a Single Adapter has Multiple IPs?
If you assign multiple IPs to the same adapter, one of two things will happen. Which of the two depends on whether or not the secondary IP shares a subnet with the primary.
When an Adapter Hosts Multiple IPs in Different Networks
The cluster identifies networks by adapter first. Therefore, if an adapter has multiple IPs, the cluster will lump them all into the same network. If another adapter on a different host has an IP in one of the networks but not all of the networks, then the cluster will simply use whichever IPs can communicate.
As an example, see the following network:
The second node has two IPs on the same adapter and the cluster has added it to the existing network. You can use this to re-IP a network with minimal disruption.
A natural question: what happens if you spread IPs for the same subnet across different existing networks? I tested it a bit and the cluster allowed it and did not bring the networks down. However, it always had the functional IP pathway to use, so that doesn’t tell us much. Had I removed the functional pathways, then it would have collapsed the remaining IPs into an all-new network and it would have worked just fine. I recommend keeping an eye on your IP scheme and not allowing things like that in the first place.
When an Adapter Hosts Multiple IPs in the Same Network
The cluster will pick a single IP in the same subnet to represent the host in that network.
What if Different Adapters on the Same Host have an IP in the Same Subnet?
The same outcome occurs as if the IPs were on the same adapter: the cluster picks one to represent the cluster and ignores the rest.
The Management Network
All clusters (Hyper-V, SQL, SOFS, etc.) require a network that we commonly dub Management. That network contains the CNO that represents the cluster as a singular system. The management network has little importance for Hyper-V, but external tools connect to the cluster using that network. By necessity, the cluster nodes use IPs on that network for their own communications.
The management network will also carry cluster-specific traffic. More on that later.
Cluster heartbeat information. Each node must hear from every other node within a specific amount of time (1 second by default). If it does not hear from a minimum of nodes to maintain quorum, then it will begin failover procedures. Failover is more complicated than that, but beyond the scope of this article.
Cluster configuration changes. If any configuration item changes, whether to the cluster’s own configuration or the configuration or status of a protected service, the node that processes the change will immediately transmit to all of the other nodes so that they can update their own local information store.
Cluster Shared Volume traffic. When all is well, this network will only carry metadata information. Basically, when anything changes on a CSV that updates its volume information table, that update needs to be duplicated to all of the other nodes. If the change occurs on the owning node, less data needs to be transmitted, but it will never be perfectly quiet. So, this network can be quite chatty, but will typically use very little bandwidth. However, if one or more nodes lose direct connectivity to the storage that hosts a CSV, all of its I/O will route across a cluster network. Network saturation will then depend on the amount of I/O the disconnected node(s) need(s).
Live Migration Networks
That heading is a bit of misnomer. The cluster does not have its own concept of a Live Migration network per se. Instead, you let the cluster know which networks you will permit to carry Live Migration traffic. You can independently choose whether or not those networks can carry other traffic.
Other Identified Networks
The cluster may identify networks that we don’t want to participate in any kind of cluster communications at all. iSCSI serves as the most common example. We’ll learn how to deal with those.
Now we know our traffic types. Next, we need to architect our cluster networks to handle them appropriately. Let’s begin by understanding why you shouldn’t take the easy route of using a singular network. A minimally functional Hyper-V cluster only requires that “management” network. Stopping there leaves you vulnerable to three problems:
The cluster will be unable to select another IP network for different communication types. As an example, Live Migration could choke out the normal cluster hearbeat, causing nodes to consider themselves isolated and shut down
The cluster and its hosts will be unable to perform efficient traffic balancing, even when you utilize teams
IP-based problems in that network (even external to the cluster) could cause a complete cluster failure
Therefore, you want to create at least one other network. In the pre-2012 model we could designate specific adapters to carry specific traffic types. In the 2012 and later model, we simply create at least one more additional network to allow cluster communications but not client access. Some benefits:
Clusters of version 2012 or new will automatically employ SMB multichannel. Inter-node traffic (including Cluster Shared Volume data) will balance itself without further configuration work.
The cluster can bypass trouble on one IP network by choosing another; you can help by disabling a network in Failover Cluster Manager
Better load balancing across alternative physical pathways
The Second Supporting Network… and Beyond
Creating networks beyond the initial two can add further value:
If desired, you can specify networks for Live Migration traffic, and even exclude those from normal cluster communications. Note: For modern deployments, doing so typically yields little value
If you host your cluster networks on a team, matching the number of cluster networks to physical adapters allows the teaming and multichannel mechanisms the greatest opportunity to fully balance transmissions. Note: You cannot guarantee a perfectly smooth balance
Architecting Hyper-V Cluster Networks
Now we know what we need and have a nebulous idea of how that might be accomplished. Let’s get into some real implementation. Start off by reviewing your implementation choices. You have three options for hosting a cluster network:
One physical adapter or team of adapters per cluster network
Convergence of one or more cluster networks onto one or more physical teams or adapters
Convergence of one or more cluster networks onto one or more physical teams claimed by a Hyper-V virtual switch
A few pointers to help you decide:
For modern deployments, avoid using one adapter or team for a cluster network. It makes poor use of available network resources by forcing an unnecessary segregation of traffic.
I personally do not recommend bare teams for Hyper-V cluster communications. You would need to exclude such networks from participating in a Hyper-V switch, which would also force an unnecessary segregation of traffic.
The most even and simple distribution involves a singular team with a Hyper-V switch that hosts all cluster network adapters and virtual machine adapters. Start there and break away only as necessary.
A single 10 gigabit adapter swamps multiple gigabit adapters. If your hosts have both, don’t even bother with the gigabit.
To simplify your architecture, decide early:
How many networks you will use. They do not need to have different functions. For example, the old management/cluster/Live Migration/storage breakdown no longer makes sense. One management and three cluster networks for a four-member team does make sense.
The IP structure for each network. For networks that will only carry cluster (including intra-cluster Live Migration) communication, the chosen subnet(s) do not need to exist in your current infrastructure. As long as each adapter in a cluster network can reach all of the others at layer 2 (Ethernet), then you can invent any IP network that you want.
I recommend that you start off expecting to use a completely converged design that uses all physical network adapters in a single team. Create Hyper-V network adapters for each unique cluster network. Stop there, and make no changes unless you detect a problem.
Comparing the Old Way to the New Way (Gigabit)
Let’s start with a build that would have been common in 2010 and walk through our options up to something more modern. I will only use gigabit designs in this section; skip ahead for 10 gigabit.
In the beginning, we couldn’t use teaming. So, we used a lot of gigabit adapters:
There would be some variations of this. For instance, I would have added another adapter so that I could use MPIO with two iSCSI networks. Some people used Fiber Channel and would not have iSCSI at all.
Important Note: The “VMs” that you see there means that I have a virtual switch on that adapter and the virtual machines use it. It does not mean that I have created a VM cluster network. There is no such thing as a VM cluster network. The virtual machines are unaware of the cluster and they will not talk to it (if they do, they’ll use the Management access point like every other non-cluster system).
Then, 2012 introduced teaming. We could then do all sorts of fun things with convergence. My very least favorite:
This build takes teams to an excess. Worse, the management, cluster, and Live Migration teams will be idle almost all the time, meaning that this 60% of this host’s networking capacity will be generally unavailable.
Let’s look at something a bit more common. I don’t like this one either, but I’m not revolted by it either:
A lot of people like that design because, so they say, it protects the management adapter from problems that affect the other roles. I cannot figure out how they perform that calculus. Teaming addresses any probable failure scenarios. For anything else, I would want the entire host to fail out of the cluster. In this build, a failure that brought the team down but not the management adapter would cause its hosted VMs to become inaccessible because the node would remain in the cluster. That’s because the management adapter would still carry cluster heartbeat information.
My preferred design follows:
Now we are architected against almost all types of failure. In a “real-world” build, I would still have at least two iSCSI NICs using MPIO.
What is the Optimal Gigabit Adapter Count?
Because we had one adapter per role in 2008 R2, we often continue using the same adapter count in our 2012+ builds. I don’t feel that’s necessary for most builds. I am inclined to use two or three adapters in data teams and two adapters for iSCSI. For anything past that, you’ll need to have collected some metrics to justify the additional bandwidth needs.
10 Gigabit Cluster Network Design
10 gigabit changes all of the equations. In reasonable load conditions, a single 10 gigabit adapter moves data more than 10 times faster than a single gigabit adapter. When using 10 GbE, you need to change your approaches accordingly. First, if you have both 10GbE and gigabit, just ignore the gigabit. It is not worth your time. If you really want to use it, then I would consider using it for iSCSI connections to non-SSD systems. Most installations relying on iSCSI-connected spinning disks cannot sustain even 2 Gbps, so gigabit adapters would suffice.
Logical Adapter Counts for Converged Cluster Networking
I didn’t include the Hyper-V virtual switch in any of the above diagrams, mostly because it would have made the diagrams more confusing. However, I would use a Hyper-V team to host all of the logical adapters necessary. For a non-Hyper-V cluster, I would create a logical team adapter for each role. Remember that on a logical team, you can only have a single logical adapter per VLAN. The Hyper-V virtual switch has no such restrictions. Also remember that you should not use multiple logical team adapters on any team that hosts a Hyper-V virtual switch. Some of the behavior is undefined and your build might not be supported.
I would always use these logical/virtual adapter counts:
One management adapter
A minimum of one cluster communications adapter up to n-1, where n is the number of physical adapters in the team. You can subtract one because the management adapter acts as a cluster adapter as well
In a gigabit environment, I would add at least one logical adapter for Live Migration. That’s optional because, by default, all cluster-enabled networks will also carry Live Migration traffic.
In a 10 GbE environment, I would not add designated Live Migration networks. It’s just logical overhead at that point.
In a 10 GbE environment, I would probably not set aside physical adapters for storage traffic. At those speeds, the differences in offloading technologies don’t mean that much.
Architecting IP Addresses
Congratulations! You’ve done the hard work! Now you just need to come up with an IP scheme. Remember that the cluster builds networks based on the IPs that it discovers.
Every network needs one IP address for each node. Any network that contains an access point will need an additional IP for the CNO. For Hyper-V clusters, you only need a management access point. The other networks don’t need a CNO.
Only one network really matters: management. Your physical nodes must use that to communicate with the “real” network beyond. Choose a set of IPs available on your “real” network.
For all the rest, the member IPs only need to be able to reach each other over layer 2 connections. If you have an environment with no VLANs, then just make sure that you pick IPs in networks that don’t otherwise exist. For instance, you could use 192.168.77.0/24 for something, as long as that’s not a “real” range on your network. Any cluster network without a CNO does not need to have a gateway address, so it doesn’t matter that those networks won’t be routable. It’s preferred, in fact.
Implementing Hyper-V Cluster Networks
Once you have your architecture in place, you only have a little work to do. Remember that the cluster will automatically build networks based on the subnets that it discovers. You only need to assign names and set them according to the type of traffic that you want them to carry. You can choose:
Allow cluster communication (intra-node heartbeat, configuration updates, and Cluster Shared Volume traffic)
Allow client connectivity to cluster resources (includes cluster communication) and cluster communications (you cannot choose client connectivity without cluster connectivity)
Prevent participation in cluster communications (often used for iSCSI and sometimes connections to external SMB storage)
As much as I like PowerShell for most things, Failover Cluster Manager makes this all very easy. Access the Networks tree of your cluster:
I’ve already renamed mine in accordance with their intended roles. A new build will have “Cluster Network”, “Cluster Network 1”, etc. Double-click on one to see which IP range(s) it assigned to that network:
Work your way through each network, setting its name and what traffic type you will allow. Your choices:
Allow cluster network communication on this network AND Allow clients to connect through this network: use these two options together for the management network. If you’re building a non-Hyper-V cluster that needs access points on non-management networks, use these options for those as well. Important: The adapters in these networks SHOULD register in DNS.
Allow cluster network communication on this networkONLY (do not check Allow clients to connect through this network): use for any network that you wish to carry cluster communications (remember that includes CSV traffic). Optionally use for networks that will carry Live Migration traffic (I recommend that). Do not use for iSCSI networks. Important: The adapters in these networks SHOULD NOT register in DNS.
Do not allow cluster network communication on this network: Use for storage networks, especially iSCSI. I also use this setting for adapters that will use SMB to connect to a storage server running SMB version 3.02 in order to run my virtual machines. You might want to use it for Live Migration networks if you wish to segregate Live Migration from cluster traffic (I do not do or recommend that).
Once done, you can configure Live Migration traffic. Right-click on the Networks node and click Live Migration Settings:
Check a network’s box to enable it to carry Live Migration traffic. Use the Up and Down buttons to prioritize.
What About Traffic Prioritization?
In 2008 R2, we had some fairly arcane settings for cluster network metrics. You could use those to adjust which networks the cluster would choose as alternatives when a primary network was inaccessible. We don’t use those anymore because SMB multichannel just figures things out. However, be aware that the cluster will deliberately choose Cluster Only networks over Cluster and Client networks for inter-node communications.
What About Hyper-V QoS?
When 2012 first debuted, it brought Hyper-V networking QoS along with it. That was some really hot new tech, and lots of us dove right in and lost a lot of sleep over finding the “best” configuration. And then, most of us realized that our clusters were doing a fantastic job balancing things out all on their own. So, I would recommend that you avoid tinkering with Hyper-V QoS unless you have tried going without and had problems. Before you change QoS, determine what traffic needs to be attuned or boosted before you change anything. Do not simply start flipping switches, because the rest of us already tried that and didn’t get results. If you need to change QoS, start with this TechNet article.
Does your preferred network management system differ from mine? Have you decided to give my arrangement a try? How id you get on? Let me know in the comments below, I really enjoy hearing from you guys!
Automating deployments has quickly become the norm for IT professionals servicing most organizations from small-scale up. But automation can go far beyond just deploying VMs. It is used to configure Active Directory inside a VM, File Services, DNS, etc. automatically providing a boost to productivity, accuracy and workload management.
Earlier this year I had the privilege of speaking for the MVP Days Virtual Conference. For those that aren’t aware, the MVP Days Virtual Conference is a monthly event hosted by Dave and Cristal Kawula showcasing the skills and know-how of individuals in the Microsoft MVP Program. The idea being that Microsoft MVPs are a great resource of knowledge for IT Pros, and this virtual conference gives them a platform to share that knowledge on a monthly basis.
The following video is a recording of my presentation “3 Tools for Automating Deployments in the Era of the Modern Hybrid Cloud”.
Deployments in the Hybrid Cloud
Workloads and IT Infrastructures are becoming more complex and spread out than ever. It used to be that IT Pros had little to worry about outside the confines of their network, but those days are long over. Today a new workload is just as likely to be successfully residing in the public cloud as it is on premises. Cloud computing technologies like Microsoft Azure have provided a number of capabilities to IT Pros that were previously unheard of in all but the most complex enterprise datacenters. The purview of the IT Pro no longer stops within the 4 walls of his/her network but wherever the workload lives at a given time.
With these new innovations and technologies comes the ability to mass deploy applications and services either on-premises or in the public cloud in a very easy way, but what happens when you need to automate the deployment of workloads and services that stretch from on-premises to the public cloud? Many IT Pros struggle to automate deployments that stretch across those boundaries for a true hybrid cloud deployment.
In this demo-heavy session, you’ll discover a number of tools to assist you with your deployment operations. Learn:
How PowerShell ties-together and executes your deployment strategy end-to-end
How PowerShell Direct is used on-premises with Hyper-V to automate more than ever before
How Azure IaaS is used to effortlessly extend your automated deployments to the cloud
The Video: 3 Tools for Automating Deployments in the Era of the Modern Hybrid Cloud
Be sure to share your thoughts on the session with us in the comments section below! I’m especially interested if there are any 3rd party tools or other methods you’ve used in these kinds of deployment situations. I’d also like to hear about any challenges you’ve encountered in doing operations like this. We’ll be sure to take that info and put together some relevant posts to assist.
Q: How many networks should I employ for my clustered Hyper-V Hosts?
A: At least two, architected for redundancy, not services.
This answer serves as a quick counter to oft-repeated cluster advice from the 2008/2008 R2 era. Many things have changed since then and architecture needs to keep up. It assumes that you already know how to configure networks for failover clustering.
In this context, “networks” means “IP networks”. Microsoft Failover Cluster defines and segregates networks by their subnets:
Why the Minimum of Two?
Using two or more networks grants multiple benefits:
The cluster automatically bypasses some problems in IP networks, preventing any one problem from bringing the entire cluster down
External: a logical network failure that breaks IP communication
Internal: a Live Migration that chokes out heartbeat information, causing nodes to exit the cluster
An administrator can manually exclude a network to bypass problems
If hosted by a team, the networking stack can optimize traffic more easily when given multiple IP subnets
If necessary, traffic types can be prioritized
Your two networks must contain one “management” network (allows for cluster and client connections). All other networks beyond the first should either allow cluster communications only or prevent all cluster communications (ex: iSCSI). A Hyper-V cluster does not need more than one management network.
How Many Total?
You will need to make architectural decisions to arrive at the exact number of networks appropriate for your system. Tips:
Do not use services as a deciding point. For instance, do not build a dedicated Live Migration or CSV network. Let the system balance traffic.
In some rare instances, you may have network congestion that necessitates segregation. For example, heavy Live Migration traffic over few gigabit adapters. In that case, create a dedicated Live Migration network and employ Hyper-V QoS to limit its bandwidth usage
Do take physical pathways into account. If you have four physical network adapters in a team that hosts your cluster networks, then create four cluster networks.
Avoid complicated network builds. I see people trying to make sense out of things like 6 teams and two Hyper-V switches on 8x gigabit adapters with 4x 10-gigabit adapters. You will create a micro-management nightmare situation without benefit. If you have any 10-gigabit, just stop using the gigabit. Preferably, converge onto one team and one Hyper-V switch. Let the system balance traffic.
Do you have a question for Eric?
Ask your question in the comments section below and we may feature it in the next “Quick Tip” blog post!
I encountered a question regarding some of the environment deployment options available in Hyper-V. At the time, I just gave a quick, off-the-cuff response. On further reflection, I feel like this discussion merits more in-depth treatment. I am going to expand it out to include all of the possibilities: physical, virtual machine with Windows Server, Hyper-V container, and Windows containers. I will also take some time to talk through the Nano Server product. We will talk about strategy in this article, not how-to.
Do I Need to Commit to a Particular Strategy?
Yes… and no. For any given service or system, you will need to pick one and stick with it. Physical and virtual machines don’t differ much, but each of the choices involves radical differences. Switching to a completely different deployment paradigm would likely involve a painful migration. However, the same Hyper-V host(s) can run all non-physical options simultaneously. Choose the best option for the intended service, not by the host. I can see a case for segregating deployment types across separate Hyper-V and Windows Server hosts. For instance, the Docker engine must run on any host that uses containers. It’s small and can be stopped when not in use, but maybe you don’t want it everywhere. If you want some systems to only to run Windows containers, then you don’t need Hyper-V at all. However, if you don’t have enough hosts to make separation viable, don’t worry about it.
Check Your Support
We all get a bit starry-eyed when exposed to new, flashy features. However, we all also have at least one software vendor that remains steadfastly locked in a previous decade. I have read a number of articles that talk about moving clunky old apps into shiny new containers (like this one). Just because a thing can be done — even if it works — does not mean that the software vendor will stand behind it. In my experience, most of those vendors hanging on to the ’90s also look for any minor excuse to avoid helping their customers. Before you do anything, make that phone call. If possible, get a support statement in writing.
You have a less intense, but still important, support consideration for internal applications. Containers are a really hot technology right now, but don’t assume that even a majority of developers have the knowledge and experience — or even an interest — in containers. So, if you hire one hotshot dev that kicks out amazing container-based apps but s/he moves on, you might find yourself in a precarious position. I expect this condition to rectify itself over time, but no one can guess at how much time. Those vendors that I implicated in the previous paragraph depend on a large, steady supply of developers willing to specialize in technologies that haven’t matured in a decade, and supply does not seem to be running low.
Understand the Meaning of Your Deployment Options
You don’t have to look far for diagrams comparing containers to more traditional deployments. I’ve noticed that very nearly all of them lack coverage of a single critical component. I’ll show you my diagram; can you spot the difference?
See it? Traditional virtual machines are the only ones that wall off storage for their segregated environments. Most storage activity from a container is transient in nature — it just goes away when the container halts. Anything permanent must go to a location outside the container. While trying to decide which of these options to use, remember to factor all of that in.
I personally will not consider any new physical deployments unless conditions absolutely demand it. I do understand that vendors often dictate that we do ridiculous things. I also understand that it’s a lot easier for a technical author to say, “Vote with your wallet,” than it is to convert critical line-of-business applications away from an industry leader that depends on old technology toward a fresh, unknown startup that doesn’t have the necessary pull to integrate with your industry’s core service providers. Trust me, I get it. All I can say on that: push as hard as you can. If your industry leader is big enough, there are probably user groups. Join them and try to make your collective voices loud enough to make a difference. Physical deployments are expensive, difficult to migrate, difficult to architect against failure and depend on components that cannot be trusted.
Beware myths in this category. You do not need physical domain controllers. Most SQL servers do not need physical operating system environments — the exceptions need hardware because guest clustering has not yet become a match for physical clustering. Even then, if you can embrace new technologies such as scale-out file servers to avoid shared virtual hard disks, you can overcome those restrictions.
Traditional Virtual Machine Deployments
Virtual machines abstract physical machines, giving you almost all of the same benefits of physical with all the advantages of digital abstraction. Some of the primary benefits of using virtual machines instead of one of the other non-physical methods:
Familiarity: you’ve done this before. You, and probably your software vendors, have no fear.
Full segregation: as long as you’re truly virtualizing (meaning, no pass-through silliness) — then your virtual machines can have fully segregated and protected environments. If you need to make them completely separate, then employ the Shielded VM feature. No other method can match that level of separation.
Simple migration: Shared Nothing Live Migration, Live Migration, Quick Migration, and Storage Migration can only be used with real virtual machines.
Checkpoints: Wow, checkpoints can save you from a lot of headaches when used properly. No such thing for containers. Note: a solid argument can be made that a properly used container has no use for checkpoints.
The wall of separation provided by virtual machines comes with baggage, though. They need the most memory, the most space, and usually the most licensing costs.
Containers allow you to wall off compute components. Processes and memory live in their own little space away from everyone else. Disk I/O gets some special treatment, but it does not enjoy the true segregation of a virtual machine. However, kernel interactions (system calls, basically) get processed by the operating system that owns the container.
You have three tests when considering a container deployment:
Support: I think we covered this well enough above. Revisit that section if you skipped it.
Ability to work on today’s Windows Server version: since containers don’t have kernel segregation, they will use whatever kernel the hosting operating system uses. If you’ve got a vendor that just now certified their app on Windows Server 2008 R2 (or, argh, isn’t even there yet), then containers are clearly out. Your app provider needs to be willing to move along with kernel versions as quickly as you upgrade.
A storage dependency compatible with container storage characteristics: the complete nature of the relationship between containers and storage cannot be summed up simply. If an app wasn’t designed with containers in mind, then you need to be clear on how it will behave in a container.
Containers are the thinnest level of abstraction for a process besides running it directly on host hardware. You basically only need the Container role and Docker software running. You don’t need a lot of spare space or memory to run containers. You can use containers to get around a lot of problems introduced by trying to run incompatible processes in the same operating system environment. As long as you can satisfy all of the requirements, they might be your best solution.
Really, I think that management struggles pose the second greatest challenge to container adoption after support. Without installing even more third-party components, docker control occurs by command line. When you start following the container tutorials that have you start your containers interactively, you’ll learn that the process to get out of a container involves stopping it. So, you’ll have to also pick up some automation skills. For people accustomed to running virtual machines in a GUI installation of Windows Server, the transition will be jarring.
Virtual machine containers represent a middle ground between virtual machines and containers. They still do not get their own storage. However, they do have their own kernel apart from the physical host’s kernel. Apply the same tests as for regular containers, minus the “today’s version” bit.
Hyper-V containers give two major benefits over standard containers:
No kernel interdependence: Run just about any “guest” operating system that you like. You don’t need to worry (as much) about host upgrades.
Isolation of system calls: I can’t really qualify or quantify the value of this particular point. Security concerns have caused all operating systems to address process isolation for many years now. But, an additional layer of abstraction won’t hurt when security matters.
The biggest “problem” with Hyper-V Containers is the need for Hyper-V. That increases the complexity of the host deployment and (as a much smaller concern) increases the host’s load. Hyper-V containers still beat out traditional virtual machines in the low resource usage department but require more than standard containers. Each Hyper-V container runs a separate operating system, similar to a traditional virtual machine. They retain the not-really-separate storage profile of standard containers, though.
What About Nano Server?
If you’re not familiar with Nano, it’s essentially a Windows build with the supporting bits stripped down to the absolute minimum. As many have noticed (with varying levels of enthusiasm), Nano has been losing capabilities since its inception. As it stands today, you cannot run any of Microsoft’s infrastructure roles within Nano. With all of the above options and the limited applicability of Nano, it might seem that Nano has lost all use.
I would suggest that Nano still has a very viable place. Not every environment will have such a place, of course. If you can’t find a use for Nano, don’t force it. To understand where it might be suited, let’s start with a simplified bit of backstory on Nano. Why was infrastructure support stripped from it? Two reasons:
Common usage 1: Administrators were not implementing Nano for these roles in meaningful quantities. Microsoft has a history of moving away from features that no one uses. Seems fair.
Common usage 2: Developers were implementing Nano in droves. Microsoft turned their attention to them. Also seems fair.
Practicality: Of course, some administrators did use Nano for infrastructure features. And they wanted more. And more. And in trying to satisfy those requests, Nano started growing to the point that once it checked everyone’s boxes, it was essentially Windows Server Core with a different name.
I would like to give you a simple flowchart, but I don’t think that it would properly address all considerations. I think that it would also create a false sense of prioritization. Other than “not physical”, I don’t believe that any reasonable default answer exists. A lot of people don’t like to hear it, but I also think that “familiarity” and “comfort” carry a great deal of weight. I can’t possibly have even a rudimentary grasp of all possible decision points when it comes to the sort of applications that might be concerning you.
I personally place “supportability” at the highest place on any decision tree. I never do anything that might get me excluded from support. Even if I have the skills to fix any problems that arise, I will eventually move on from this position. Becoming irreplaceable sounds good in job security theory, but never makes for good practice.
You have other things to think about as well. What about backup? Hopefully, a container only needs its definition files to be quickly rebuilt, but people don’t always use things the way that the engineers intended. On that basis alone, virtual machines will continue to exist for a very long time.
Need More Help with the Decision?
It can be daunting to make this determination with something as new as containers in the mix. Fear not! Altaro is hosting a VERY exciting webinar later this month on this topic in mind specifically. My good friend and Altaro Technical Evangelist Andy Syrewicze will be officiating an AMA styled webinar with Microsoft’s very own Ben Armstrong on the subject of containers, and the topic written about here will be expanded upon greatly.
Authors commonly struggle with the blank, empty starting page of a new work. So, if you’ve just installed Hyper-V and don’t know what to do with that empty space, you’re in good company. Let’s take a quick tour of the steps from setup to production-ready.
1. Make a Virtual Machine
It might seem like jumping ahead, but go ahead and create a virtual machine now. It doesn’t matter what you name it. It doesn’t matter how you configure it. It won’t be ready to use and you might get something wrong — maybe a lot of somethings — but go ahead. You get three things out of this exercise:
It’s what we authors do when we aren’t certain how to get started with writing. Just do something. It doesn’t matter what. Just break up that empty white space.
You learn that none of it is permanent. Make a mistake? Oh well. Change it.
You have a focused goal. You know that the VM won’t function without some more work. Instead of some nebulous “get things going” problem, you have a specific virtual machine to fix up.
If you start here, then you’ll have no network for the virtual machine and you may wind up with it sitting on the C: drive. That’s OK.
If you want to know the basic steps for how to create a virtual machine in Hyper-V Manager, start with this article.
2. Install Updates and Configure the System
I try to get the dull, unpleasant things out of the way before I do anything with Hyper-V. In no set order:
Join the host to the domain, if you have one
Go through the driver and Windows Updates
Disable SMB 1. Make sure to remove the feature and use the PowerShell disable command
For Hyper-V configuration, I always start with my networking stack. You will likely spend a lot of time on this, especially if you’re still new to Hyper-V.
For a server deployment, I recommend that you start with my overview article. It will help you to conceptualize and create a diagram of your configuration design before you build anything. At the end, the article contains further useful links to how-to and in-depth articles: https://www.altaro.com/hyper-v/simple-guide-hyper-v-networking/
Storage often needs a lot of time to configure correctly as well.
First, you need to set up the basic parts of storage, such as SAN LUNs and volume. I’d like to give you a 100% thorough walk-through on it (perhaps at a later date), but I couldn’t possibly cover more than a few options. However, I’ve covered a few common methods in this article: https://www.altaro.com/hyper-v/storage-and-hyper-v-part-6-how-to-connect/. I didn’t cover fiber channel because no two vendors are similar enough to write a good generic article. I didn’t cover Storage Spaces Direct because it didn’t exist yet and I still don’t have an S2D cluster of my own to instruct from.
Whatever you choose to use for storage, you need at least one NTFS or ReFS location to hold your VMs. I’m not even going to entertain any discussion about pass-through disks because, seriously, join this decade and stop with that nonsense already. I’m still recommending NTFS because I’m not quite sold on ReFS for Hyper-V yet, but ReFS will work. One other thing to note about ReFS, is to make sure your backup/recovery vendor supports it.
5. Configure Hyper-V Host Settings
You probably won’t want to continue using Hyper-V’s defaults for long. Storage, especially, will probably not be what you want. Let’s modify some defaults. Right-click your host in Hyper-V Manager and click Hyper-V Settings.
This window has many settings, far more than I want to cover in a quick start article. I’ll show you a few things, though.
Let’s start with the two storage tabs:
You can rehome these anywhere that you like. Note:
For the Virtual Hard Disks setting, all new disks created using default settings will appear directly in that folder
For the Virtual Machines setting, all non-disk VM files will be created in special subfolders
In my case, my host will own local and clustered VMs. I’ll set my defaults to a local folder, but one that’s not as deep as what Hyper-V starts with.
Go explore a bit. Look at the rest of the default settings. Google what they mean, if you need. If you’ll be doing Shared Nothing Live Migrations, I recommend that you enable migrations on the Live Migrations tab.
6. Fix Up Your VM’s Settings
Remember that VM that I told you to create back in step one? I hope you did that because now you get to practice working with real settings on a real virtual machine. In this step, we’ll focus on the simple, direct settings. Right-click on your virtual machine and click Settings.
If you followed right through, then the VM’s virtual network adapter can’t communicate because it has no switch connection. So, jump down to the Network Adapter tab. In the Virtual Switch setting, where it says, Not connected, change it to that switch that you created in step 3.
Again, poke through and learn about the settings for your virtual machine. You’ll have a lot more to look at than you did for the host. Take special notice of:
Memory. Each VM defaults to 1GB of dynamic memory. You can only change a few settings during creation. You can change many more now.
Processor: Each VM defaults to a single virtual CPU. You’ll probably want to bump that up to at least 2. We have a little guidance on that, but the short version: don’t stress out about it too much.
Automatic start and stop actions: These only work for VMs that won’t be clustered.
Check out the rest of it. Look up anything that seems interesting.
7. Practice with Advanced Activities and Settings
If you followed both step one and the storage location bit of step five, then that virtual machine might not be in the location that you desire. Not a problem at all. Right-click it and choose Move. On the relevant wizard page, select Move the virtual machine’s storage:
Let’s say that you decided that you didn’t like the name of the virtual machine that you created in step one. Or, that you were just fine with the name, but you didn’t like the name of its VHDX. You can change the virtual machine’s name very simply: just highlight it and press [F2] or right-click it and select Rename. Hyper-V stores virtual machine’s names as properties in their xml/vmcx files, so you don’t need to change those. If you put the VM in a specially-named folder, then you can use the instructions above to move it to a new one. The VHDX doesn’t change so easily, though.
Let’s rename a virtual machine’s virtual hard disk file:
The virtual machine must be off. Sorry.
On the virtual hard disk’s tab in the virtual machine’s settings, click Remove:
Click Apply. That will remove the disk but leave the window open. We’ll be coming back momentarily.
Use whatever method you like to rename the VHDX file.
Back in the Hyper-V virtual machine’s settings, you should have been left on the controller tab for the disk that you removed, with Hard Drive selected. Click Add:
Browse to the renamed file:
Your virtual machine can now be started with its newly renamed hard disk.
Tip: If you feel brave, you can try to rename the file in the browse dialog, thereby skipping the need to drop out to the operating system in step 4. I have had mixed results with this due to permissions and other environmental factors.
Tip: If you want to perform a storage migration and rename a VHDX, you can wait to perform the storage migration until you have detached the virtual hard disk. The remaining files will transfer instantly and you won’t have a copy of the VHDX. After you have performed the storage migration, you can manually move the VHDX to its new home. If the same volume hosts the destination location, the move will occur almost instantly. From there, you can proceed with the rename and attach operations. You can save substantial amounts of time that way.
Bonus round: All of these things can be scripted.
In just a few simple steps, you learned the most important things about Hyper-V. What’s next? Installing a guest operating system, of course. Treat that virtual machine like a physical machine, and you’ll figure it out in no time.
Need any Help?
If you’re experiencing serious technical difficulties you should contact the Microsoft support team but for general pointers and advice, I’d love to help you out! Write to me using the comment section below and I’ll get back to you ASAP!
In case you missed it, Microsoft has announced the availability of the first public preview of Windows Server 2019. Now that we’ve all had a chance to read the announcements and maybe take it for a spin, let’s take a closer peek at what all of this will mean.
Remember the Servicing Channels
A while ago, I wrote an article about the Semi-Annual Channel (SAC) to explain the Windows Server product releasing on a six-month cadence. Windows Server 2019 belongs to the Long-Term Servicing Channel which only has new releases every few years. It is the direct descendant of Windows Server 2016. However, it brings many features from the SAC into the LTSC. It also showcases many items that have only appeared in Insider Builds.
The presence of the GUI marks the primary difference between LTSC and SAC. In case you’ve heard any rumors or had any concerns about Microsoft removing the GUI in Windows Server, you can lay them to rest right now. Windows Server 2019 has a GUI, as will all LTSC builds into the foreseeable future.
An Overview of Windows Server 2019’s Direction
You can read through the article that I linked above for Microsoft’s take on the new release. I’ll briefly recap the highlights with a few points of my own.
Hybrid Cloud Goal
I have no real idea how much interest small businesses have in hybrid solutions. I doubt that many businesses, regardless of size, have zero cloud footprint anymore. However, an authentic hybrid solution may not (yet?) make sense to the typical small business. But, given the intended multi-year lifespan of LTSC, this might be the version that plugs you in — even if that happens on some future date.
Honestly, I think that this might be the area of Windows Server 2019 with the greatest impact. The line between Azure and on-premises continues to blur and I believe that this release will serve as the gateway.
New Security Features
Windows Server 2019 includes several interesting features. On the Hyper-V front, we will get three new features related to Shielded VMs:
VMs running Linux can be shielded
If you have administrative credentials to the guest operating system, you will be able to use VMConnect
You will be able to designate Encrypted Networks in your software-defined networks to protect inter-server traffic
Outside of Hyper-V, you also get Windows Defender Advanced Threat Protection baked right in. If you’ve only met Windows Defender on client platforms, then you’re in for a treat. For us Hyper-V admins, ATP allows us to stop worrying that our antivirus program will trash our Hyper-V hosts or virtual machines.
Advanced Linux Interoperability
If you’ve already tinkered with the Linux Subsystem for Windows on Windows 10, then you’re ready for the next one. Windows Server 2019 sports LSW as well. I have a few ideas for this one myself, but I would really like to hear what other people plan to do with it.
One thing to point out: WSL does not involve containers or virtual machines (so, no nesting concerns, either). WSL really is your Linux distribution running right on top of the Windows kernel. As you can imagine, that involves some trickery to get everything to mesh. You might occasionally detect a seam. For instance, you cannot set up a WSL instance to run daemons and other background operations. However, for situations where it has a use, you certainly can’t argue with this sort of resource usage:
For situations where WSL does not address your Linux problem, you still have containers and virtual machines.
Increased Focus on Hyper-Convergence
Windows Server 2019 does not add a huge amount of capability to Windows Server for hyper-convergence. The greatest new power it offers pertains to management. The Honolulu project includes a high-powered graphical interface for Storage Spaces Direct. In addition to control, it includes displays of performance history.
One thing that Windows Server 2019 does not change is S2D’s target customer. You still need Datacenter Edition and multiple physical systems. In order for those systems to perform acceptably, you’ll need more than gigabit networking as well. With all of S2D’s wonders, small businesses will not be able to afford to buy in anytime soon.
If you get a copy of the preview, then you’ll also eventually stumble upon the Insider’s announcement page where you’ll find a few other listed features and goals. I found several enticing items.
In-place upgrades have been available forever. In your experience, how often have they resulted in a reliable, trouble-free system? For me, not often. Even though you can run an in-place upgrade more quickly, almost everyone chooses to perform a completely new installation and migrate services and data. Prior to virtualization, we typically bought new physical systems as replacements to old and replaced our operating systems instead of upgrading them. Virtualization upset the balance. But, we still chose new installs.
Microsoft has been working very hard to make in-place upgrades into a viable choice. I don’t know how that will work out. It may take many more iterations for them to gain our trust, and it would take very little for them to lose it. I will try this out, but I won’t intentionally try to influence anyone’s expectations.
A cluster set is essentially a cluster of clusters. This technology was designed to break individual resources free of any single cluster in order to dramatically increase the scale of clustering.
Of course, I tend to keep my focus more on how Microsoft technologies can help smaller businesses — the kind that won’t even approach the 64-node counter of a standard cluster. That said, I want to investigate some possibilities of the technology to see if it might have some other uses.
My Take on Windows Server 2019
The divergence of SAC from LTSC creates an interesting situation for many of us. If you’re in a small business that doesn’t use SAC, you probably also don’t need many of these new features. Finding them in 2019 probably doesn’t change much for you, either. If you’re in a larger organization that has adopted SAC, then you could just continue using SAC. LTSC adds the protection of long-term support, but that’s about it.
For the smaller organizations, the appeal of the GUI is nice. But, what else is there for you?
In my mind, I’ve begun thinking of these two channels like this:
SAC: Use for Microsoft-only uses, such as Active Directory, DHCP, DNS, file serving, Hyper-V, etc.
LTSC: Use to operate third-party line-of-business applications
From the feature sets, I don’t really know what Windows Server 2019 gets you — where’s the value-add from an upgrade? I expect that most small and large institutions will take it only organically. However, don’t forget that Microsoft always keeps the greatest focus on the current version of Windows. Aside from security fixes, things just sort of stop happening for older versions.
For those groups, I find two things enticing about Windows Server 2019 over Windows Server 2016:
Windows Defender Advanced Threat Protection
Enhanced support for hybrid cloud
I think that organizations that skew more toward a “medium”-sized organization (no fixed definition exists) will get the most value out of Windows Server 2019. These organizations probably don’t keep up with the rapid release of SAC, but will still want access to the newer features. They get the dual comfort of support and a GUI.
That said, don’t lose track of the hybrid cloud focus. Windows Server 2019 just might start encouraging a wider audience to look into the impressive offerings of Azure.
How to Get Started
It’s time to have some fun! Get your own copy of the Windows Server 2019 Preview and make up your own mind.
I’m never entirely certain how it happens, but some people seem to forget what “preview” means. I want everyone reading this article to keep some things in mind:
“Preview” means “pre-release”. “Pre-release” means “not ready for release”. “Not ready for release” means “not production ready”. Do not use this software to hold up important services. Do not expect it to behave all of the time. For instance, when I installed into virtual machines on my 2016 system, the guest OS locked up so hard that VMConnect failed to work and I couldn’t even force the virtual machines offline. I had to reboot the host to get them to work (that fixed them). Preview releases are intended to get lots of people to play with the bits so that they can be massaged into a production-ready product.
Preview installs cannot live forever. All of them have a concrete, time-bombed expiration date. Do not become overly attached to your preview installs.
Preview releases update frequently. Insiders are already accustomed to that. Regular administrators might find it to be a bit of a shock.
More preview releases will come. That would be a good time to test out in-place upgrade, no? You’ll know them by their build numbers. WS2019 starts with build 17623.
Try out the new build. Report back! Don’t report technical details to me though. I won’t mind hearing from you, of course, but I can’t do much for you. Use the Windows Insiders feedback forum: https://techcommunity.microsoft.com/t5/Windows-Server-Insiders/bd-p/WindowsServerInsiders. However, I’d love to hear your general thoughts about Windows Server 2019. Do the new features work well for you? Disappointed you didn’t get the upgrade you were expecting? Let me know in the comments below.
As an infrastructure hypervisor, Hyper-V hits all the high notes. However, it misses on some of the management aspects, though. You can find many control features in System Center Virtual Machine Manager, but I don’t feel that product was well-designed and the pricing places it out of reach of many small businesses anyway. Often, we don’t even need a heavy management layer; sometimes just one or two desires go unmet by the free tools. Of those, admins commonly request the ability to create and deploy templates. The free tools don’t directly include that functionality, but you can approximate it with only a bit of work.
The Concept of the Gold Image
You will be building “gold” or “master” (or even “gold master”) images as the core of this solution. This means that you’ll spend at least a little time configuring an environment (or several environments) to your liking. Instead of sending those directly to production, you’ll let them sit cold. When you want to deploy a new system, you use one of those as a base rather than building the instance up from scratch.
As you might have guessed, we do need to take some special steps with these images. They are not merely regular systems that have been turned off. We “generalize” them first, using a designated tool called “sysprep”. That process strips away all known unique identifiers for a Windows instance. The next time anyone boots that image, they’ll be presented with the same screens that you would see after freshly installing Windows. However, most non-identifying customization, such as software installations, will remain.
Do I Need Gold Images?
The simpler your environment, the less the concept of the gold image seems to fit. I wouldn’t write it off entirely, though. Even with rare usage, you can use a gold image to jump ahead of a lot of the drudgery of setting a new system. If you deploy from the same image only twice, it will be worth your time.
For any environment larger than a few servers, the need for gold images becomes apparent quickly. Otherwise, you wind up spending significant amounts of time designing and deploying new systems. Since major parts of new server deployments share steps (and the equivalent involved time), you get the best usage by leveraging gold images.
Usually, the resistance to such images revolves around the work involved. People often don’t wish to invest much time in something whose final product will mostly just sit idle. I think that there’s also something to that “all-new” feeling of a freshly built image that you lose with gold images. The demands of modern business don’t really allow for these archaic notions. Do the work once, maybe some maintenance effort later, and ultimately save yourself and your colleagues many hours.
Should I Image Workstation or Server Environments?
The majority of my virtualization experience involves server instances. To that end, I’ve been using some sort of template strategy ever since I started using Hyper-V. I only build all-new images when new operating systems debut or significant updates release. Even if I wasn’t sure that I’d ever deploy a server OS more than once, I would absolutely build an image for it.
Workstation OSes have a different calculus. If you’ll be building a Microsoft virtual-machine RDS deployment, then you cannot avoid gold images. If you’re using only hardware deployments, then you might still image, but probably not the way that I’m talking about in this article. I will not illustrate workstation OSes, as the particulars of the process do not deviate meaningfully from server instances.
What About OS and Software Keys?
For operating systems, you have two basic types:
Keyed during install: This key will be retained after the sysprep, so you’ll need to use a key with enough remaining activations. KMS keys work best for this. With others, you’ll need to be prepared to change the key after deployment if the situation calls for it. If you have Windows Server Datacenter Edition as your hypervisor, then you can use the AVMA keys. If you don’t have DC edition, then you could technically still use the keys but you’ll have to immediately change it after deployment. I have no idea how this plays legally, so consider that a last-ditch risky move.
Keyed after install: This usually happens with volume licensing images. These are the best because you really don’t have to plan anything. Key it afterward. Of course, you also need to qualify for volume licensing in order to use this option at all, so…
OEM keys: I’m not even going to wade into that. Ask your reseller.
If you use the ADK (revisited a bit in an upcoming section), you have ways to address key problems.
As for software, you’ll have all sorts of issues with that. Most retain their keys. Lots of them have activation routines, too, so there’s that. And all of the things that come with it. You will need to think through and test. It will be worth the effort far more often than not.
What About Linux Gold Images?
Yes, you most certainly can create gold masters of Linux. In a way, it can be easier. Linux doesn’t use a lot of fancy computer identification techniques or have system-specific GUIDs embedded anywhere. Usually, you can just duplicate a Linux system at will and just rename it and assign a new IP.
Unfortunately, that’s not always the case. Because exceptions are so rare, there’s also no singular built-in tool to handle the items that need generalization. The only problem that I’ve encountered so far is with SSH keys. I found one set of instructions to regenerate them: https://serverfault.com/questions/471327/how-to-change-a-ssh-host-key.
Creating Gold Images for Hyper-V Templating
The overall process:
Create a virtual machine
Install the operating system
If that sounds familiar, you probably do something like that for physical systems as well.
Let’s go over the steps in more detail.
Creating the Virtual Machine and Installing the Operating System
You start by simply doing what you might have done any number of times before: create a new virtual machine. One thing really matters: the virtual machine generation. Whatever generation you choose for the gold image will be the generation of all virtual machines that you build on it. Sure, there are some conversion techniques… but why use them? If you will need Generation 1 and Generation 2 VMs, then build two templates.
The rest of the settings of the virtual machine that you use for creating a gold image do not matter (unless the dictates of some particular software package override). You have more than one option for image storage, but in all cases, you will deploy to unique virtual machines whose options can be changed.
Once you’ve got your virtual machine created, install Windows Server (or whatever) as normal (note, especially for desktop deployments: many guides mention booting in Audit mode, which I have never done; this appears to be most important when Windows Store applications are in use):
Customizing the Gold Image
If you’re working on your first image, I would not go very far with this. You want a generic image to start with. For initial images, I tend to insert things like BGInfo that I want on every system. You can then use this base image to create more specialized images.
I have plans for future articles that will expand on your options for customization. You can perform simple things, like installing software. You can do more complicated things, such as using the Automated Deployment Kit. One of the several useful aspects of ADK is the ability to control keying.
Tip: If you have software that requires .Net 3.5, you can save a great deal of time by having a branch of images that include that feature pre-installed:
Just remember that you want to create generic images. Do not try to create a carbon-copy of an intended live system. If that’s your goal (say, for quick rollback to a known-good build), then create the image that you want as a live system and store a permanent backup copy. You could use an export if you like.
Very important: Patch the image fully, but only after you have all of the roles, features, and applications installed.
Generalize the Gold Image
Once you have your image built the way that you like it, you need to seal it. That process will make the image generic, freezing it into a state from which it can be repeatedly deployed. Windows (and Windows Server) includes that tool natively: sysprep.
The best way to invoke sysprep is by a simple command-line process. Use these switches:
The first three parameters are standard. We can use the last one because we’re creating Hyper-V images. It will ensure that the image doesn’t spend a lot of time worrying about hardware.
Tip: If you want to use the aforementioned Audit Mode so that you can work with software packages, use /audit instead of /oobe.
Tip: You can also just run sysprep.exe to get the user interface where you can pick all of these options except the mode. Your image(s) will work just fine if you don’t use /mode:vm.
Once the sysprep operation completes, it will stop the virtual machine. At that point, consider it to be in a “cold” state. Starting it up will launch the continuation of a setup process. So, you don’t want to do that. Instead, store the image so that it can be used later.
Storing a Gold Image
Decide early how you want to deploy virtual machines from this image. You have the option of creating all-new virtual machines at each go and re-using a copy of the VHDX. Alternatively, you can import a virtual machine as a copy. I use both techniques, so I recommend export. That way, you’ll have the base virtual machine andthe VHDX so you can use either as suits you.
Image storage tip: Use deduplicated storage. In my test lab, keep mine on a volume using Windows Server deduplication in Hyper-V mode. That mode only targets VHDX files and was intended for running VDI deployments. It seems to work well for cold image storage, as well. I have not tried with the normal file mode.
VHDX Copy Storage
If you only want to store the VHDX, then copy it to a safe location. Give the VHDX a very clear name as to its usage. Don’t forget that Windows allows you to use very, very long filenames. Delete the root virtual machine afterward and clean up after it.
The benefit of copy storage is that you can easily lay out all of your gold image VHDXs side-by-side in the same folder and not need to keep track of all of those virtual machine definition files and folders.
Exported Image Storage
By exporting the virtual machine, you can leverage import functionality to easily deploy virtual machines without much effort on your part. There are some downsides, but they’re not awful:
The export process takes a tiny bit of work. It’s not much, but…
When importing, the name of the VHDX cannot be changed. So, you wind up with a newly-deployed virtual machine that uses the same VHDX name as your gold image. That problem can be fixed, of course, but it’s extra work.
I discovered that we’ve never written an article on exporting virtual machines. I’ll rectify that in a future article and we’ll link this one to it. Fortunately, the process is not difficult. Start by right-clicking your virtual machine and clicking the Export option. Follow the wizard from there:
Tip: Disconnect any ISO images prior to exporting. Otherwise, the export will make a copy of that ISO to go with the VM image and it will remain permanently attached and deployed with each import.
Deploying Hyper-V Virtual Machines from Gold Images
From these images that you created, you can build a new virtual machine and attach a copy of the VHDX or you can import a copy.
Remember that Hyper-V Manager doesn’t have much for configuration options during VM creation. Set the memory and CPU and network up before proceeding.
To finish up, copy the gold image’s VHDX into whatever target location you like. You can use a general “Virtual Hard Disks” folder like the Hyper-V default settings do, or you can drop it in a folder named after the VM. It really doesn’t matter, as long as the Hyper-V host can reach the location. If it were me, I would also rename the VHDX copy to something that suits the VM.
Once you have the VHDX placed, use Hyper-V Manager to attach it to the virtual machine:
Once you hit OK, you can boot up the VM. It will start off like a newly-installed Windows machine, but all your customizations will be in place.
Deploying with Import
Importing saves you a bit of work in exchange for a bit of different but optional work.
Especially ensure that you choose to Copy. Either of the other choices will cause problems for the gold image.
Now, the fun parts. It will import with the name of the exported VM. That’s probably not what you want. You’ll need to:
Rename the virtual machine
Rename the VHDX(s)
Detach the VHDX(s)
You may need to rename the folder(s), depending on how you deployed. That hasn’t been a problem for me, so far.
At this point, you are mostly finished. The one thing to keep in mind: the guest operating system will have a generic name unrelated to the virtual machine’s name. Don’t forget to fix that. Also, IP addresses will not be retained, etc.
Further Work and Consideration
What I’ve shown you only takes you through some simplistic builds. You can really turn this into a powerhouse deployment system. Things to think about:
After you have a basic build, import it, customize it further and sysprep it again. Repeat as necessary.
Microsoft places limits on how many times an image can be sysprepped. Therefore, always try to work from the first image rather than from a deep child.
Traditionally, the preferred choice for a cluster quorum witness has been some type of networked disk. Small SAN LUNs do the trick nicely. Things have changed a bit, increasing the viability of the file share witness. You can configure one easily in a few simple steps.
Prerequisites for a File Share Witness
Just as with a cluster disk witness, you need to do a bit of work in advance.
First, you need to pick a system that will host the share. It needs to be reliable, although not necessarily bulletproof. We’ll talk about that after the how-to.
Space used on that share will be extremely tiny. Mine doesn’t even use 100 bytes.
You need to set adequate share and NTFS permissions. I just used my domain group that includes the cluster and nodes and gave it Full Control for both and that worked. From my observations, only the domain account that belongs to the cluster name object is used. It seems that Change on the share and Modify on NTFS are adequate permission levels.
If you have firewalls configured, the cluster node needs to be able to reach the share’s host on port 445.
The cluster will create its own folder underneath the root of this share. When you configure the witness, it will generate a GUID to use as the folder name. Therefore, you can point multiple clusters at the same share.
Using PowerShell to Configure a File Share Witness
Using Failover Cluster Manager to Configure a File Share Witness
Failover Cluster Manager has two different ways to get to the same screen.
In Failover Cluster Manager, right-click the cluster’s root node, go to More Actions, and click Configure Cluster Quorum Settings.
Click Next on the introductory screen.
If you choose Advanced quorum configuration, you can change which nodes have quorum votes. That is not part of this article, but you’ll eventually get to the same screen that I’m taking you to. For my directions, choose Select the quorum witness. Click Next.
Choose Configure a file share witness and click Next.
You can manually enter or browse to the shared location.
The next screen summarizes your proposed changes. Review them and click Next when ready. The cluster will attempt to establish your setting.
The final screen shows the results of your action.
Checking/Verifying Your Cluster Quorum Settings in Failover Cluster Manager
The home screen in Failover Cluster Manager shows most of the pertinent information for a file share witness. Look in the Cluster Core Resources section:
You can see the most important things: the state of the share and where it lives. You cannot see the name of the sub-folder used.
Checking/Verifying Your Cluster Quorum Settings in PowerShell
The built-in PowerShell module does not join the information quite as gracefully. You can quickly see the status and mode with Get-ClusterResource:
If desired, you can pare it down a bit with
Get-ClusterResource-Name'File Share Witness' or just
Get-ClusterResource'File Share Witness'.
That will output the GUID used in the other registry keys and the subfolder that was created on the file share witness. You can use it to retrieve them:
Troubleshooting a Cluster File Share Witness
There isn’t a lot to the file share witness. If the cluster says that it’s offline, then it can’t reach the share or it doesn’t have the necessary permissions.
I have noticed that the cluster takes longer to recognize that the share has come back online than a cluster disk. You can force it to come back online more quickly by right-clicking it in Failover Cluster Manager (screenshot above) and clicking Bring Online.
Why Use a File Share Witness for a Hyper-V Cluster?
With the how-to out of the way, we can talk about why. For me, it was because of the evolution of the cluster that I use to write these articles. I’ve been running the Microsoft iSCSI Target. But, as my cluster matured, I’ve been moving toward SMB. I don’t have anything left on iSCSI except quorum, Keeping it makes no sense.
To decide for your own option, analyze your options:
No witness: In a cluster with an odd number of nodes, you can go without a witness. However, with Microsoft Failover Clustering’s Dynamic Quorum technology, you’ll generally want to have a witness. I’m not sure of any good reason to continue using this mode.
Disk witness: Disk witness requires you to configure a standard Cluster Disk and dedicate it to quorum. Historically, we’ve built a SAN LUN of 512MB. If you have a SAN or other fiber channel/iSCSI target and don’t mind setting aside an entire LUN for quorum, this is a good choice. Really, the only reason to go with a disk witness when you have this option available is if you want to have your quorum be separate from your target.
Cloud witness (2016+): The cloud witness lets you use a small space on Azure storage as your witness. It’s a cool option, but not everyone has reliable Internet. Since you’ve already got storage for your cluster, cloud quorum might be overkill. But, it’s not a bad thing if you’ve got solid Internet. I would choose this for geographically dispersed clusters but otherwise would probably skip it in favor of a disk or share witness.
Realistically, I think the share witness works great when you’re already relying on SMB storage. As those builds become more common, the file share witness will likely increase in popularity.
Reliability Requirements for a File Share Witness
You want your file share witness to be mostly reliable, but it does not need to be 100%. Do the best that you can, but do not invest an inordinate amount of time and effort trying to ensure that it never fails. If you have a scale-out file server that always is online, that’s best. But, if you just have a single SMB system that hosts your VMs, that will work too. Remember that Dynamic Quorum will work to keep your cluster at a reasonable quorum level even if the file share witness goes offline.
Need any help?
If you’re having any issues setting up and configuring a File Share Witness for a Hyper-V Cluster let me know in the comments below and I’ll help you out. Also, if you’ve got any feedback at all on what’s been written here I’d love to hear it!
For years, I’d never heard of this problem. Then, suddenly, I’m seeing it everywhere. It’s not easy to precisely outline a symptom tree for you. Networked applications will behave oddly. Remote desktop sessions may skip or hang. Some network traffic will not pass at all. Other traffic will behave erratically. Rather than try to give you a thorough symptom tree, we’ll just describe the setup that can be addressed with the contents of this article: you’re using Hyper-V with a third-party network load balancer and experiencing network-related problems.
Before I ever encountered it, the problem was described to me by one my readers. Check out our Complete Guide to Hyper-V Networking article and look in the comments section for Jahn’s input. I had a different experience, but that conversation helped me reach a resolution much more quickly.
Problem Reproduction Instructions
The problem may appear under other conditions, but should always occur under these:
The network adapters that host the Hyper-V virtual switch are configured in a team
Load-balancing algorithm: Dynamic
Teaming mode: Switch Independent (likely occurs with switch-embedded teaming as well)
Traffic to/from affected virtual machines passes through a third-party load-balancer
Load balancer uses a MAC-based system for load balancing and source verification
Citrix Netscaler calls its feature “MAC based forwarding”
F5 load balancers call it “auto last hop”
The load balancer’s “internal” IP address is on the same subnet as the virtual machine’s
Sufficient traffic must be exiting the virtual machine for Hyper-V to load balance some of it to a different physical adapter
I’ll go into more detail later. This list should help you determine if you’re looking at an article that can help you.
Fixing the problem is very easy, and can be done without downtime. I’ll show the options in preference order. I’ll explain the impacting differences later.
Option 1: Change the Load-Balancing Algorithm
Your best bet is to change the load-balancing algorithm to “Hyper-V port”. You can change it in the lbfoadmin.exe graphical interface if your management operating system is GUI-mode Windows Server. To change it with PowerShell (assuming only one team):
There will be a brief interruption of networking while the change is made. It won’t be as bad as the network problems that you’re already experiencing.
Option 2: Change the Teaming Mode
Your second option is to change your teaming mode. It’s more involved because you’ll also need to update your physical infrastructure to match. I’ve always been able to do that without downtime as long as I changed the physical switch first, but I can’t promise the same for anyone else.
Decide if you want to use Static teaming or LACP teaming. Configure your physical switch accordingly.
Change your Hyper-V host to use the same mode. If your Hyper-V system’s management operating system is Windows Server GUI, you can use lbfoadmin.exe. To change it in PowerShell (assuming only one team):
Option 3: Disable the Feature on the Load Balancer
You could tell the load balancer to stop trying to be clever. In general, I would choose that option last.
An Investigation of the Problem
So, what’s going on? What caused all this? If you’ve got an environment that matches the one that I described, then you’ve unintentionally created the perfect conditions for a storm.
Whose fault is it? In this case, I don’t really think that it’s fair to assign fault. Everyone involved is trying to make your network traffic go faster. They sometimes do that by playing fast and loose in that gray area between Ethernet and TCP/IP. We have lots of standards that govern each individually, but not so many that apply to the ways that they can interact. The problem arises because Microsoft is playing one game while your load balancer plays another. The games have different rules, and neither side is aware that another game is afoot.
Traffic Leaving the Virtual Machine
We’ll start on the Windows guest side (also applies to Linux). Your application inside your virtual machine wants to send some data to another computer. That goes something like this:
Application: “Network, send this data to computer www.altaro.com on port 443”.
Network: “DNS server, get me the IP for www.altaro.com”
Network: “IP layer, determine if the IP address for www.altaro.com is on the same subnet”
Network: “IP layer, send this packet to the gateway”
IP layer passes downward for packaging in an Ethernet frame
Ethernet layer transfers the frame
The part to understand: your application and your operating system don’t really care about the Ethernet part. Whatever happens down there just happens. Especially, it doesn’t care at all about the source MAC.
Traffic Crossing the Hyper-V Virtual Switch
Because this particular Ethernet frame is coming out of a Hyper-V virtual machine, the first thing that it encounters is the Hyper-V virtual switch. In our scenario, the Hyper-V virtual switch rests atop a team of network adapters. As you’ll recall, that team is configured to use the Dynamic load balancing algorithm in Switch Independent mode. The algorithm decides if load balancing can be applied. The teaming mode decides which pathway to use and if it needs to repackage the outbound frame.
Switch independent mode means that the physical switch doesn’t know anything about a team. It only knows about two or more Ethernet endpoints connected in standard access mode. A port in that mode can “host” any number of MAC addresses;the physical switch’s capability defines the limit. However, the same MAC address cannot appear on multiple access ports simultaneously. Allowing that would cause all sorts of problems.
So, if the team wants to load balance traffic coming out of a virtual machine, it needs to ensure that the traffic has a source MAC address that won’t cause the physical switch to panic. For traffic going out anything other than the primary adapter, it uses the MAC address of the physical adapter.
So, no matter how many physical adapters the team owns, one of two things will happen for each outbound frame:
The team will choose to use the physical adapter that the virtual machine’s network adapter is registered on. The Ethernet frame will travel as-is. That means that its source MAC address will be exactly the same as the virtual network adapter’s (meaning, not repackaged)
The team will choose to use an adapter other than the one that the virtual machine’s network adapter is registered on. The Ethernet frame will be altered. The source MAC address will be replaced with the MAC address of the physical adapter
Note: The visualization does not cover all scenarios. A virtual network adapter might be affinitized to the second physical adapter. If so, its load balanced packets would travel out of the shown “pNIC1” and use that physical adapter’s MAC as a source.
Traffic Crossing the Load Balancer
So, our frame arrives at the load balancer. The load balancer has a really crummy job. It needs to make traffic go faster, not slower. And, it acts like a TCP/IP router. Routers need to unpackage inbound Ethernet frames, look at their IP information, and make decisions on how to transmit them. That requires compute power and time.
If it needs too much time to do all this, then people would prefer to live without the load balancer. That means that the load balancer’s manufacturer doesn’t sell any units, doesn’t make any money, and goes out of business. So, they come up with all sorts of tricks to make traffic faster. One way to do that is by not doing quite so much work on the Ethernet frame. This is a gross oversimplification, but you get the idea:
Essentially, the load balancer only needs to remember which MAC address sent which frame, and then it doesn’t need to worry so much about all that IP nonsense (it’s really more complicated than that, but this is close enough).
The Hyper-V/Load Balancer Collision
Now we’ve arrived at the core of the problem: Hyper-V sends traffic from virtual machines using source MAC addresses that don’t belong to those virtual machines. The MAC addresses belong to the physical NIC. When the load balancer tries to associate that traffic with the MAC address of the physical NIC, everything breaks.
Trying to be helpful (remember that), the load balancer attempts to return what it deems as “response” traffic to the MAC that initiated the conversation. The MAC, in this case, belongs directly to that second physical NIC. It wasn’t expecting the traffic that’s now coming in, so it silently discards the frame.
That happens because:
The Windows Server network teaming load balancing algorithms are send only; they will not perform reverse translations. There are lots of reasons for that and they are all good, so don’t get upset with Microsoft. Besides, it’s not like anyone else does things differently.
Because the inbound Ethernet frame is not reverse-translated, its destination MAC belongs to a physical NIC. The Hyper-V virtual switch will not send any Ethernet frame to a virtual network adapter unless it owns the destination MAC
In typical system-to-system communications, the “responding” system would have sent its traffic to the IP address of the virtual machine. Through the normal course of typical networking, that traffic’s destination MAC would always belong to the virtual machine. It’s only because your load balancer is trying to speed things along that the frame is being sent to the physical NIC’s MAC address. Otherwise, the source MAC of the original frame would have been little more than trivia.
Stated a bit more simply: Windows Server network teaming doesn’t know that anyone cares about its frames’ source MAC addresses and the load balancer doesn’t know that anyone is lying about their MAC addresses.
Why Hyper-V Port Mode Fixes the Problem
When you select the Hyper-V port load balancing algorithm in combination with the switch independent teaming mode, each virtual network adapter’s MAC address is registered on a single physical network adapter. That’s the same behavior that Dynamic uses. However, no load balancing is done for any given virtual network adapter; all traffic entering and exiting any given virtual adapter will always use the same physical adapter. The team achieves load balancing by placing each virtual network adapter across its physical members in a round-robin fashion.
Source MACs will always be those of their respective virtual adapters, so there’s nothing to get confused about.
I like this mode as a solution because it does a good job addressing the issue without making any other changes to your infrastructure. The drawback would be if you only had a few virtual network adapters and weren’t getting the best distribution. For a 10GbE system, I wouldn’t worry.
Why Static and LACP Fix the Problem
Static and LACP teaming involve your Windows Server system and the physical switch agreeing on a single logical pathway that consists of multiple physical pathways. All MAC addresses are registered on that logical pathway. Therefore, the Windows Server team has no need of performing any source MAC substitution regardless of the load balancing algorithm that you choose.
Since no MAC substitution occurs here, the load balancer won’t get anything confused.
I don’t like this method as much. It means modifying your physical infrastructure. I’ve noticed that some physical switches don’t like the LACP failover process very much. I’ve encountered some that need a minute or more to notice that a physical link was down and react accordingly. With every physical switch that I’ve used or heard of, the switch independent mode fails over almost instantly.
That said, using a static or LACP team will allow you to continue using the Dynamic load balancing algorithm. All else being equal, you’ll get a more even load balancing distribution with Dynamic than you will with Hyper-V port mode.
Why You Should Let the Load Balancer Do Its Job
The third listed resolution suggests disabling the related feature on your load balancer. I don’t like that option, personally. I don’t have much experience with the Citrix product, but I know that the F5 buries their “Auto Last Hop” feature fairly deeply. Also, these two manufacturers enable the feature by default. It won’t be obvious to a maintainer that you’ve made the change.
However, your situation might dictate that disabling the load balancer’s feature causes fewer problems than changing the Hyper-V or physical switch configuration. Do what works best for you.
Using a Different Internal Router Also Addresses the Issue
In all of these scenarios, the load balancer performs routing. Actually, these types of load balancers always perform routing, because they present a single IP address for the service to the outside world and translate internally to the back-end systems.
However, nothing states that the internal source IP address of the load balancer must exist in the same subnet as the back-end virtual machines. You might do that for performance reasons; as I said above, routing incurs overhead. However, this all a known quantity and modern routers are pretty good at what they do. If any router is present between the load balancer and the back-end virtual machines, then the MAC address issue will sort itself out regardless of your load balancing and teaming mode selections.
Have You Experienced this Phenomenon?
If so, I’d love to hear from you. What system did you experience it happening? How did you resolve the situation (if you were able)? Perhaps you’ve just encountered it and arrived here to get a solution – if so let me know if this explanation was helpful or if you need any further assistance regarding your particular environment. The comment section below awaits.