I’ve seen a lot of questions from those who have recently deployed Hyper-V for the first time. Some just need a few pointers to iron some minor glitches, but some are in really bad shape. Here are some of the common deployment mistakes and their solutions.
1. Mis-Provisioning Resources in Hyper-V
There are a lot of ways to get the hardware wrong. This is usually the result of not having system profiles or by taking advice from people that don’t have to write the checks for your systems.
Improper Balance of CPU and Memory
You are almost guaranteed to run out of memory resources long before you run out of CPU. Don’t be one of those poor souls that buys dual 20-core CPUs with 64GB of RAM. Memory can’t be shared. Even though you can use Dynamic Memory to squeeze in more VMs than might otherwise fit, the memory that each VM uses belongs only to that VM. CPU, on the other hand, spreads out nicely. To keep it short, it’s because CPU cores aren’t dedicated to anything and will easily handle the load from multiple virtual machines.
The best thing to do is run Performance Monitor traces against systems you intend to place into your new virtual environment. If you can’t do that, try to get some idea from people who have. If you can’t do that, you can talk to the manufacturers of the applications you’ll be virtualizing. They’re going to overstate their needs, but it’s a starting point. If all else fails, Microsoft used to recommend 8-12 virtual CPUs per physical core for virtualized server operating systems. It’s not wonderful, as explained in the linked article, but it’s better than nothing.
Improper Balance of Networked Storage and Network Connectivity
Two things I’ve noticed about storage are that people dramatically overestimate just how much storage speed they need and dramatically overestimate just how fast storage can perform. They’ll hear that RAID-10 is the fastest RAID build, so they’ll stick six or eight disks in a RAID-10 array and then blow a bunch of money connecting to it over dual 10GbE adapters. It will probably work, but mainly because people who build that sort of configuration usually don’t need a great deal of disk performance.
The short explanation is: spinning disks are very slow and 10GbE networking is very fast. A bonded 10GbE pair is very very fast. You’re going to need a lot more than 6 or 8 disks to come anywhere keeping that thing satisfied. Even if you’ve got that many disks, you’re still going to need lots of lots of demand or the line is still going to be mostly empty.
It’s more important to size disk appropriately than CPU, because storage costs can escalate quickly. Try to get an idea for what your real I/O needs will be in advance. Don’t blindly ask people for advice; you’ll be advised to over-buy every time. If you don’t know, databases performing hundreds of transactions per minute, or more, need lots of I/O. Most everything else doesn’t. In aggregate, disk needs can be high, of course, but a dozen VMs that each average 30 IOPS of load won’t even bog down four 15k disks.
Improper Balance of SSD and Spinning Disks
SSD is obviously a fantastic application for virtualization loads. It’s really fast and the low latency makes the scattered access of multiple virtual machines into a non-issue. We have a while until it gets cheap enough to replace spinning disk, though. In the interim, we can build hybrid arrays that use both. With Storage Spaces, we can even do it with commodity server hardware.
Unfortunately, these SSDs are often used inappropriately. More than once, I’ve seen newcomers install Hyper-V Server on a pair of SSDs and use their spinning disks to hold virtual machines. This is a horrible misapplication of disk resources. Hyper-V Server, or Windows Server with Hyper-V, is going to do a lot of disk churn when it’s first turned on and then it’s going to sit idle. It’s your virtual machines that need disk I/O. If you’re not going to put those SSD to use holding VM data, yank them and put them in a different computer where they can earn their keep. It’s better to use Storage Spaces as mentioned in the previous paragraph.
Improper Balance of Networking Resources
If you’re building a standalone Hyper-V host with two network cards, the tribal knowledge has been to dedicate one to the management operating system and cram all the virtual machines into the other. In 2008 R2 and prior, you either had to do that or you had to go off-support and make a manufacturer team. With native teaming, you don’t have to do that anymore.
For clustered guests, I see people balancing networking resources in all sorts of odd ways. Virtual machines are given one or two physical adapters while an entire physical adapter is left for CSV traffic. Again, that was a necessity in 08 R2 and earlier, but not anymore. In 2012 R2, a dedicated CSV network is all but pointless, provided that you have sufficient bandwidth available for cluster communications in general.
There are a lot of ways to screw up networking in Hyper-V. For this section, just spend some time thinking about where you need the most bandwidth to keep your virtualized applications happy, and design accordingly.
Improper Focus of Resources
I know that all the cool kids are telling you to use always fixed VHDXs for everything, but it’s lazy, terrible advice. Yes, they’re a bit faster, and yes, they’ll prevent you from ever accidentally over-provisioning storage. But they’re not much faster, and I have full faith that you can do the minor math necessary to keep from over-provisioning. If you provision virtual machines with C: drives for guest operating systems and other VHDX files for data, then that C: drive is optimal for the dynamically expanding format. You can set it for 60 GB and expect it to stay well under 40 for its entire life. If you’ve got 10 virtual machines, that’s a minimum of 200 GB of savings. Or, if you unquestioningly use fixed, a minimum of 200 GB of completely wasted space.
The same goes for a lot of other aspects of virtualization. Before you design your deployment: Stop! Think! What are the most likely bottlenecks you will face? That is where you focus, not based on some list of false always/never items dreamed up by somebody who copy/pasted it from someone else who copied it from someone else who invented it out of whole cloth.
The thing is, over-committing resources is sort of “what we do” in the virtualization space. If you’re uncomfortable with that, then maybe you’re just not ready for virtualization. In 2014 (or whatever year you happen to be reading this), it’s simply a waste to provision much of anything in a virtual environment at a 1:1 ratio unless you’re demonstrably certain that those resources are going to be consumed at that ratio.
2. Creating Too Many Networks and/or Virtual Adapters
This mistake category usually sources from not understanding the Hyper-V virtual switch. For a standalone Hyper-V system, the management operating system needs to be present in exactly one network using exactly one IP. Preferably, it will be routed (have access to a gateway), but you can skip this if your security requirements necessitate it. That single IP should register in DNS so your management system(s) can reach it by name. That IP doesn’t need to have anything in common with any of the guests at all. It doesn’t have to be on the same subnet(s) or VLAN(s). You don’t need to create an IP for the management OS in all of the VLANs/subnets that your guests will be using. You certainly don’t want to create a virtual adapter for anything to do with virtual machine traffic. Your guests will not be routing any traffic through the management operating system. The only other network presence you need to create for the management operating system would be for any iSCSI connections.
For a clustered host, you must have a management IP address and a cluster network IP address. You should create a cluster network to dedicate to Live Migrations. You can create additional cluster communications networks if you want, but they’re only useful if you can use SMB multichannel. You might also need some IPs for iSCSI connections. That’s it. Just as with the standalone, don’t go creating a lot of IP addresses within VM networks and don’t go creating virtual adapters in the management operating system for virtual machine traffic.
3. Creating Too Many Virtual Switches
You probably only need one virtual switch. There are use cases for multiple virtual switches, but not many. One would be if you have some guests you want to place in a DMZ while other guests need to stay on the corporate network. You could use dedicated physical switches with a dedicated virtual switch for the DMZ guests. It should be pointed out that for most organizations, VLANs or network virtualization can probably achieve a sufficient degree of isolation.
4. Optimizing Page Files
The purpose of a page file is to give applications access to more physical memory than is available to the operating system. The operating system’s preference is to use it to hold memory that is rarely accessed. If your page files have such a performance dependency that you need to put significant time into thinking where they are going to go, that means the memory it is holding is not rarely accessed. That means you broke something way back at step 1 or you did it right and some condition changed. Note that I’m not talking about preparation for Replica. As for the hypervisor’s page file, it will have near zero usage. You just need to be sure you have enough space for it.
5. Not Leveraging Dynamic Memory
When we bloggers and book authors write about Dynamic Memory, we exert so much effort scaring people away from using it on SQL and Exchange servers that a lot of people never use it for anything. There isn’t as much slack in memory as there is with CPU or disk, but it’s still there. The really nice thing about Dynamic Memory is that you can adjust it on the fly. Set what you think is a good minimum, erring toward the high side, and what you think is a good maximum, erring toward the low side. You can always reduce minimums and increase maximums. What you can’t do is modify fixed memory while the guest is on.
6. Leaving Default VM Configurations
If you use the wizard to create a virtual machine, you get a single vCPU. If you enable Dynamic Memory, its maximum will be 1TB. If your guest operating system is post-XP/Server 2003, you want at least 2 vCPUs. You almost definitely don’t want Dynamic Memory to allow up to 1TB, and not least because you don’t have the physical memory to back it up. As I said in #5, you can always increase the maximum, but you have to turn the guest off to reduce it. If it’s at a ridiculously high number and you get some greedy or memory-leaking application running, one guest can throw off the entire balance of the host’s allocation to every other guest.
7. Not Troubleshooting the Right Thing
“I bought a 120-core server with 32TB of RAM and a SAN with 720 SSD disks and Hyper-V guest disk access is SOOOOO SLOW and it shouldn’t be because look at all this hardware!”
See the problem?
“Oh, my network connection between the host and the SAN? Well, they have 10GB cards so that’s not it.”
Did we really get an answer?
“Oh, the interconnects? Well, they go into Cisco switches that then go to a couple of old Novell Netware servers running TCP/IP-to-IPX/SPX gateways through a couple of really old Ethernet II hubs connected by 847 feet of Cat-3 cable, about 350 feet of which runs through an air conditioner bank, over an exposed hilltop, and around a nuclear waste dump. Why? Is that a problem?”
OK, so that’s never happened. I think. But, the point is, people never seem to remember all of the components that go into making this stuff work.
It’s not just resources, either.
“I virtualized a web server and no one can get to it! What is wrong with Hyper-V? Didn’t Microsoft test this stuff?”
Two or three questions later.
“Well, yeah, I left the firewall on without a port 80 exception and didn’t put the guest’s adapter into the correct VLAN or give it good IP information, but still, it’s all because Hyper-V!”
Virtual machines can’t do anything magical. They have to be configured correctly, just like a physical environment.
8. Overloading the Management Operating System
The management operating system should run virtual machines and backup software. If you want, it can run anti-malware software. End of list.
Want to run Active Directory domain services? A software router? A web server? A Team Fortress 2 dedicated server? Hmm, I wonder where that sort of thing could go… Oh, look! A hypervisor! A thing that can run virtual machines! Up to 1,024 on the same host! If it isn’t a virtual machine or something to backup a virtual machine or something meant to prevent the hypervisor or a virtual machine from being compromised, then it goes inside a virtual machine. Every time. If you don’t want to virtualize it, then find another physical system. If you don’t want to do that, then I’m sorry, I just don’t think you’re understanding this whole virtualization thing.
9. Leaving the Management OS in Workgroup Mode When there is a Perfectly Good AD Domain Available
Is your Hyper-V host in the DMZ? No? Then join it to the domain if you have one. I know you’ve been told that it’s more secure to leave it in workgroup configuration, but you’ve been told very, very wrong. Are you concerned that if someone compromises the management operating system and it’s in the domain that they’ll also be able to access all your guests? Guess what? If it’s in the workgroup and they compromise it, that situation is not any better! If an attacker gains access to the host, s/he has, at minimum, read access to all the VHDX files for all its guests. If any of them are in the domain, there is no functional difference whether or not the host is a domain member. Worse, your host has been compromised and the only barrier you put up to stop it was workgroup-grade security.
If you’re doing it because all of your domain controllers are virtual and you got conned by the chicken-and-egg myth, we debunked that a while ago. The only possible chicken-and-egg scenario is if your DCs are all virtualized and stored on SMB 3 shares, because SMB 3 connections are refused if they can’t be authenticated. The first, best answer is “don’t put [all] virtual DCs on SMB 3 shares”. The last, worst answer is “don’t join the host to the domain”, especially considering that it won’t fix your SMB 3 problem either.
10. Not Testing
Failure to test is sort of a universal failure of IT shops. Just because a set of hardware and a particular configuration should work doesn’t mean that it will. Plan -> deploy -> test -> go live. Order variance is not acceptable.
11. Avoiding PowerShell
I can understand not wanting to learn something new. It’s hard, it breaks the pattern that works for you, and there’s so many other things you need to do that there’s just no time. Thing is, there’s a lot of Hyper-V that you can only get to with PowerShell. There’s a lot of most Microsoft server products that can only be reached by PowerShell. It seems like a lot, but once you automate something that you used to have to do manually, you become addicted pretty quickly. Don’t cut off the best tool you’ve got.
12. Not Figuring Out Licensing in Advance
I’m going to estimate that about 90% of the unofficial material out there on licensing is junk. I’ve written a couple of posts on it, most recently this one, but it’s clear that we’re not winning the war on licensing ignorance. It would be nice if Microsoft would publish a clearer licensing FAQ, but they don’t. The thing is, you know that you have to buy licenses, and you probably know who you’re going to buy them from. Rather than doing a bunch of searches and scouring forums and listening to people that are not authorized to speak on the subject, just call that vendor. If they’re an authorized reseller of Microsoft licenses, they’ve got someone on staff that will be able to ask you a couple of questions and tell you within the span of a few minutes exactly what you need to buy.
This is really important. If you’re ever audited (all it takes to trigger an audit is a single phone call) and you’re out of compliance, the fines can get astronomical. They do tend to show leniency when you can make a convincing case that you didn’t know any better. However, out of the organizations I know that have been audited, all of them that were out of compliance had to pay at least a symbolic fine — and I’m pretty sure they disagreed with the auditors on how much money constitutes “symbolic”. Why did they have to pay? Because the phone call is free, that’s why. No one can really say they faced meaningful barriers to finding out what to do. You might say, “Hey, this Eric Siron guy on this page here said that I could do this.” Know what that gets you? A fine. Are they going to come after me? No. Am I going to pay it? No. If I made a mistake or wasn’t clear enough, then you have my apologies and sympathies, but nothing else. The responsibility is yours, and yours alone, to get this right. Make the phone call.