Few Hyper-V topics burn up the Internet quite like “performance”. No matter how fast it goes, we always want it to go faster. If you search even a little, you’ll find many articles with long lists of ways to improve Hyper-V’s performance. The less focused articles start with general Windows performance tips and sprinkle some Hyper-V-flavored spice on them. I want to use this article to tighten the focus down on Hyper-V hardware settings only. That means it won’t be as long as some others; I’ll just think of that as wasting less of your time.
1. Upgrade your system
I guess this goes without saying but every performance article I write will always include this point front-and-center. Each piece of hardware has its own maximum speed. Where that speed barrier lies in comparison to other hardware in the same category almost always correlates directly with cost. You cannot tweak a go-cart to outrun a Corvette without spending at least as much money as just buying a Corvette — and that’s without considering the time element. If you bought slow hardware, then you will have a slow Hyper-V environment.
Fortunately, this point has a corollary: don’t panic. Production systems, especially server-class systems, almost never experience demand levels that compare to the stress tests that admins put on new equipment. If typical load levels were that high, it’s doubtful that virtualization would have caught on so quickly. We use virtualization for so many reasons nowadays, we forget that “cost savings through better utilization of under-loaded server equipment” was one of the primary drivers of early virtualization adoption.
2. BIOS Settings for Hyper-V Performance
Don’t neglect your BIOS! It contains some of the most important settings for Hyper-V.
- C States. Disable C States! Few things impact Hyper-V performance quite as strongly as C States! Names and locations will vary, so look in areas related to Processor/CPU, Performance, and Power Management. If you can’t find anything that specifically says C States, then look for settings that disable/minimize power management. C1E is usually the worst offender for Live Migration problems, although other modes can cause issues.
- Virtualization support: A number of features have popped up through the years, but most BIOS manufacturers have since consolidated them all into a global “Virtualization Support” switch, or something similar. I don’t believe that current versions of Hyper-V will even run if these settings aren’t enabled. Here are some individual component names, for those special BIOSs that break them out:
- Virtual Machine Extensions (VMX)
- AMD-V — AMD CPUs/mainboards. Be aware that Hyper-V can’t (yet?) run nested virtual machines on AMD chips
- VT-x, or sometimes just VT — Intel CPUs/mainboards. Required for nested virtualization with Hyper-V in Windows 10/Server 2016
- Data Execution Prevention: DEP means less for performance and more for security. It’s also a requirement. But, we’re talking about your BIOS settings and you’re in your BIOS, so we’ll talk about it. Just make sure that it’s on. If you don’t see it under the DEP name, look for:
- No Execute (NX) — AMD CPUs/mainboards
- Execute Disable (XD) — Intel CPUs/mainboards
- Second Level Address Translation: I’m including this for completion. It’s been many years since any system was built new without SLAT support. If you have one, following every point in this post to the letter still won’t make that system fast. Starting with Windows 8 and Server 2016, you cannot use Hyper-V without SLAT support. Names that you will see SLAT under:
- Nested Page Tables (NPT)/Rapid Virtualization Indexing (RVI) — AMD CPUs/mainboards
- Extended Page Tables (EPT) — Intel CPUs/mainboards
- Disable power management. This goes hand-in-hand with C States. Just turn off power management altogether. Get your energy savings via consolidation. You can also buy lower wattage systems.
- Use Hyperthreading. I’ve seen a tiny handful of claims that Hyperthreading causes problems on Hyper-V. I’ve heard more convincing stories about space aliens. I’ve personally seen the same number of space aliens as I’ve seen Hyperthreading problems with Hyper-V (that would be zero). If you’ve legitimately encountered a problem that was fixed by disabling Hyperthreading AND you can prove that it wasn’t a bad CPU, that’s great! Please let me know. But remember, you’re still in a minority of a minority of a minority. The rest of us will run Hyperthreading.
- Disable SCSI BIOSs. Unless you are booting your host from a SAN, kill the BIOSs on your SCSI adapters. It doesn’t do anything good or bad for a running Hyper-V host but slows down physical boot times.
- Disable BIOS-set VLAN IDs on physical NICs. Some network adapters support VLAN tagging through boot-up interfaces. If you then bind a Hyper-V virtual switch to one of those adapters, you could encounter all sorts of network nastiness.
3. Storage Settings for Hyper-V Performance
I wish the IT world would learn to cope with the fact that rotating hard disks do not move data very quickly. If you just can’t cope with that, buy a gigantic lot of them and make big RAID 10 arrays. Or, you could get a stack of SSDs. Don’t get six or so spinning disks and get sad that they “only” move data at a few hundred megabytes per second. That’s how the tech works.
Performance tips for storage:
- Learn to live with the fact that storage is slow.
- Remember that speed tests do not reflect real world load and that file copy does not test anything except permissions.
- Learn to live with Hyper-V’s I/O scheduler. If you want a computer system to have 100% access to storage bandwidth, start by checking your assumptions. Just because a single file copy doesn’t go as fast as you think it should, does not mean that the system won’t perform its production role adequately. If you’re certain that a system must have total and complete storage speed, then do not virtualize it. The only way that a VM can get that level of speed is by stealing I/O from other guests.
- Enable read caches
- Carefully consider the potential risks of write caching. If acceptable, enable write caches. If your internal disks, DAS, SAN, or NAS has a battery backup system that can guarantee clean cache flushes on a power outage, write caching is generally safe. Internal batteries that report their status and/or automatically disable caching are best. UPS-backed systems are sometimes OK, but they are not foolproof.
- Prefer few arrays with many disks over many arrays with few disks.
- Unless you’re going to store VMs on a remote system, do not create an array just for Hyper-V. By that, I mean that if you’ve got six internal bays, do not create a RAID-1 for Hyper-V and a RAID-x for the virtual machines. That’s a Microsoft SQL Server 2000 design. This is 2017 and you’re building a Hyper-V server. Use all the bays in one big array.
- Do not architect your storage to make the hypervisor/management operating system go fast. I can’t believe how many times I read on forums that Hyper-V needs lots of disk speed. After boot-up, it needs almost nothing. The hypervisor remains resident in memory. Unless you’re doing something questionable in the management OS, it won’t even page to disk very often. Architect storage speed in favor of your virtual machines.
- Set your fibre channel SANs to use very tight WWN masks. Live Migration requires a hand off from one system to another, and the looser the mask, the longer that takes. With 2016 the guests shouldn’t crash, but the hand-off might be noticeable.
- Keep iSCSI/SMB networks clear of other traffic. I see a lot of recommendations to put each and every iSCSI NIC on a system into its own VLAN and/or layer-3 network. I’m on the fence about that; a network storm in one iSCSI network would probably justify it. However, keeping those networks quiet would go a long way on its own. For clustered systems, multi-channel SMB needs each adapter to be on a unique layer 3 network (according to the docs; from what I can tell, it works even with same-net configurations).
- If using gigabit, try to physically separate iSCSI/SMB from your virtual switch. Meaning, don’t make that traffic endure the overhead of virtual switch processing, if you can help it.
- Round robin MPIO might not be the best, although it’s the most recommended. If you have one of the aforementioned network storms, Round Robin will negate some of the benefits of VLAN/layer 3 segregation. I like least queue depth, myself.
- MPIO and SMB multi-channel are much faster and more efficient than the best teaming.
- If you must run MPIO or SMB traffic across a team, create multiple virtual or logical NICs. It will give the teaming implementation more opportunities to create balanced streams.
- Use jumbo frames for iSCSI/SMB connections if everything supports it (host adapters, switches, and back-end storage). You’ll improve the header-to-payload bit ratio by a meaningful amount.
- Enable RSS on SMB-carrying adapters. If you have RDMA-capable adapters, absolutely enable that.
- Use dynamically-expanding VHDX, but not dynamically-expanding VHD. I still see people recommending fixed VHDX for operating system VHDXs, which is just absurd. Fixed VHDX is good for high-volume databases, but mostly because they’ll probably expand to use all the space anyway. Dynamic VHDX enjoys higher average write speeds because it completely ignores zero writes. No defined pattern has yet emerged that declares a winner on read rates, but people who say that fixed always wins are making demonstrably false assumptions.
- Do not use pass-through disks. The performance is sometimes a little bit better, but sometimes it’s worse, and it almost always causes some other problem elsewhere. The trade-off is not worth it. Just add one spindle to your array to make up for any perceived speed deficiencies. If you insist on using pass-through for performance reasons, then I want to see the performance traces of production traffic that prove it.
- Don’t let fragmentation keep you up at night. Fragmentation is a problem for single-spindle desktops/laptops, “admins” that never should have been promoted above first-line help desk, and salespeople selling defragmentation software. If you’re here to disagree, you better have a URL to performance traces that I can independently verify before you even bother entering a comment. I have plenty of Hyper-V systems of my own on storage ranging from 3-spindle up to >100 spindle, and the first time I even feel compelled to run a defrag (much less get anything out of it) I’ll be happy to issue a mea culpa. For those keeping track, we’re at 6 years and counting.
4. Memory Settings for Hyper-V Performance
There isn’t much that you can do for memory. Buy what you can afford and, for the most part, don’t worry about it.
- Buy and install your memory chips optimally. Multi-channel memory is somewhat faster than single-channel. Your hardware manufacturer will be able to help you with that.
- Don’t over-allocate memory to guests. Just because your file server had 16GB before you virtualized it does not mean that it has any use for 16GB.
- Use Dynamic Memory unless you have a system that expressly forbids it. It’s better to stretch your memory dollar farther than wring your hands about whether or not Dynamic Memory is a good thing. Until directly proven otherwise for a given server, it’s a good thing.
- Don’t worry so much about NUMA. I’ve read volumes and volumes on it. Even spent a lot of time configuring it on a high-load system. Wrote some about it. Never got any of that time back. I’ve had some interesting conversations with people that really did need to tune NUMA. They constitute… oh, I’d say about .1% of all the conversations that I’ve ever had about Hyper-V. The rest of you should leave NUMA enabled at defaults and walk away.
5. Network Settings for Hyper-V Performance
Networking configuration can make a real difference to Hyper-V performance.
- Learn to live with the fact that gigabit networking is “slow” and that 10GbE networking often has barriers to reaching 10Gbps for a single test. Most networking demands don’t even bog down gigabit. It’s just not that big of a deal for most people.
- Learn to live with the fact that a) your four-spindle disk array can’t fill up even one 10GbE pipe, much less the pair that you assigned to iSCSI and that b) it’s not Hyper-V’s fault. I know this doesn’t apply to everyone, but wow, do I see lots of complaints about how Hyper-V can’t magically pull or push bits across a network faster than a disk subsystem can read and/or write them.
- Disable VMQ on gigabit adapters. I think some manufacturers are finally coming around to the fact that they have a problem. Too late, though. The purpose of VMQ is to redistribute inbound network processing for individual virtual NICs away from CPU 0, core 0 to the other cores in the system. Current-model CPUs are fast enough that they can handle many gigabit adapters.
- If you are using a Hyper-V virtual switch on a network team and you’ve disabled VMQ on the physical NICs, disable it on the team adapter as well. I’ve been saying that since shortly after 2012 came out and people are finally discovering that I’m right, so, yay? Anyway, do it.
- Don’t worry so much about vRSS. RSS is like VMQ, only for non-VM traffic. vRSS, then, is the projection of VMQ down into the virtual machine. Basically, with traditional VMQ, the VMs’ inbound traffic is separated across pNICs in the management OS, but then each guest still processes its own data on vCPU 0. vRSS splits traffic processing across vCPUs inside the guest once it gets there. The “drawback” is that distributing processing and then redistributing processing causes more processing. So, the load is nicely distributed, but it’s also higher than it would otherwise be. The upshot: almost no one will care. Set it or don’t set it, it’s probably not going to impact you a lot either way. If you’re new to all of this, then you’ll find an “RSS” setting on the network adapter inside the guest. If that’s on in the guest (off by default) and VMQ is on and functioning in the host, then you have vRSS. woohoo.
- Don’t blame Hyper-V for your networking ills. I mention this in the context of performance because your time has value. I’m constantly called upon to troubleshoot Hyper-V “networking problems” because someone is sharing MACs or IPs or trying to get traffic from the dark side of the moon over a Cat-3 cable with three broken strands. Hyper-V is also almost always blamed by people that just don’t have a functional understanding of TCP/IP. More wasted time that I’ll never get back.
- Use one virtual switch. Multiple virtual switches cause processing overhead without providing returns. This is a guideline, not a rule, but you need to be prepared to provide an unflinching, sure-footed defense for every virtual switch in a host after the first.
- Don’t mix gigabit with 10 gigabit in a team. Teaming will not automatically select 10GbE over the gigabit. 10GbE is so much faster than gigabit that it’s best to just kill gigabit and converge on the 10GbE.
- 10x gigabit cards do not equal 1x 10GbE card. I’m all for only using 10GbE when you can justify it with usage statistics, but gigabit just cannot compete.
6. Maintenance Best Practices
Don’t neglect your systems once they’re deployed!
- Take a performance baseline when you first deploy a system and save it.
- Take and save another performance baseline when your system reaches a normative load level (basically, once you’ve reached its expected number of VMs).
- Keep drivers reasonably up-to-date. Verify that settings aren’t lost after each update.
- Monitor hardware health. The Windows Event Log often provides early warning symptoms, if you have nothing else.
If you carry out all (or as many as possible) of the above hardware adjustments you will witness a considerable jump in your hyper-v performance. That I can guarantee. However, for those who don’t have the time, patience or prepared to make the necessary investment in some cases, Altaro has developed an e-book just for you. Find out more about it here: Supercharging Hyper-V Performance for the time-strapped admin.
Fragmentation is the most crippling problem in computing, wouldn’t you agree? I mean, that’s what the strange guy downtown paints on his walking billboard, so it must be true, right? And fragmentation is at least five or six or a hundred times worse for a VHDX file, isn’t it? All the experts are saying so, according to my psychic.
But, when I think about it, my psychic also told me that I’d end up rich with a full head of hair. And, I watched that downtown guy lose a bet to a fire hydrant. Maybe those two aren’t the best authorities on the subject. Likewise, most of the people that go on and on about fragmentation can’t demonstrate anything concrete that would qualify them as storage experts. In fact, they sound a lot like that guy that saw your employee badge in the restaurant line and ruined your lunch break by trying to impress you with all of his anecdotes proving that he “knows something about computers” in the hopes that you’d put in a good word for him with your HR department (and that they have a more generous attitude than his previous employers on the definition of “reasonable hygiene practices”).
To help prevent you from ever sounding like that guy, we’re going to take a solid look at the “problem” of fragmentation.
Where Did All of this Talk About Fragmentation Originate?
Before I get very far into this, let me point out that all of this jabber about fragmentation is utter nonsense. Most people that are afraid of it don’t know any better. The people that are trying to scare you with it either don’t know what they’re talking about or are trying to sell you something. If you’re about to go to the comments section with some story about that one time that a system was running slowly but you set everything to rights with a defrag, save it. I once bounced a quarter across a twelve foot oak table, off a guy’s forehead, and into a shot glass. Our anecdotes are equally meaningless, but at least mine is interesting and I can produce witnesses.
The point is, the “problem” of fragmentation is mostly a myth. Like most myths, it does have some roots in truth. To understand the myth, you must know its origins.
These Aren’t Your Uncle’s Hard Disks
In the dark ages of computing, hard disks were much different from the devices that you know and love today. I’m young enough that I missed the very early years, but the first one owned by my family consumed the entire top of a desktop computer chassis. I was initially thrilled when my father presented me with my very own PC as a high school graduation present. I quickly discovered that it was a ploy to keep me at home a little longer because it would be quite some time before I could afford an apartment large enough to hold its hard drive. You might be thinking, “So what, they were physically bigger. I have a dozen magazines claiming that size doesn’t matter!” Well, those articles weren’t written about computer hard drives, were they? In hard drives, physical characteristics matter.
Old Drives Were Physically Larger
The first issue is diameter. Or, more truthfully, radius. You see, there’s a little arm inside that hard drive whose job it is to move back and forth from the inside edge to the outside edge of the platter and back, picking up and putting down bits along the way. That requires time. The further the distance, the more time required. Even if we pretend that actuator motors haven’t improved at all, less time is required to travel a shorter distance. I don’t know actual measurements, but it’s a fair guess that those old disks had over a 2.5-inch radius, whereas modern 3.5″ disks are closer to a 1.5″ radius and 2.5″ disks something around a 1″ radius. It doesn’t sound like much until you compare them by percentage differences. Modern enterprise-class hard disks have less than half the maximum read/write head travel distance of those old units.
It’s not just the radius. The hard disk that I had wasn’t only wide, it was also tall. That’s because it had more platters in it than modern drives. That’s important because, whereas each platter has its own set of read/write heads, a single motor controls all of the arms. Each additional platter increases the likelihood that the read/write head arm will need to move a meaningful distance to find data between any two read/write operations. That adds time.
Old Drives Were Physically Slower
After size, there’s rotational speed. The read/write heads follow a line from the center of the platter out to the edge of the platter, but that’s their only range of motion. If a head isn’t above the data that it wants, then it must hang around and wait for that data to show up. Today, we think of 5,400 RPM drives as “slow”. That drive of mine was moping along at a meagerly 3,600 RPM. That meant even more time was required to get/set data.
There were other factors that impacted speed as well, although none quite so strongly as rotational speed improvements. The point is, physical characteristics in old drives meant that they pushed and pulled data much more slowly than modern drives.
Old Drives Were Dumb
Up until the mid-2000s, every drive in (almost) every desktop computer used a PATA IDE or EIDE interface (distinction is not important for this discussion). A hard drive’s interface is the bit that sits between the connecting cable bits and the spinning disk/flying head bits. It’s the electronic brain that figures out where to put data and where to go get data. IDE brains are dumb (another word for “cheap”). They operate on a FIFO (first-in first-out) basis. This is an acronym that everyone knows but almost no one takes a moment to think about. For hard drives, it means that each command is processed in exactly the order in which it was received. Let’s say that it gets the following:
- Read data from track 1
- Write data to track 68,022
- Read data from track 2
An IDE drive will perform those operations in exactly that order, even though it doesn’t make any sense. If you ever wondered why SCSI drives were so much more expensive than IDE drives, that was part of the reason. SCSI drives were a lot smarter. They would receive a list of demands from the host computer, plot the optimal course to satisfy those requests, and execute them in a logical fashion.
In the mid-2000s, we started getting new technology. AHCI and SATA emerged from the primordial acronym soup as Promethean saviors, bringing NCQ (native command queuing) to the lowly IDE interface. For the first time, IDE drives began to behave like SCSI drives. … OK, that’s overselling NCQ. A lot. It did help, but not as much as it might have because…
Operating Systems Take More Responsibility
It wasn’t just hard drives that operated in FIFO. Operating systems started it. They had good excuses, though. Hard drives were slow, but so were all of the other components. A child could conceive of better access techniques than FIFO, but even PhDs struggled against the CPU and memory requirements to implement them. Time changed all of that. Those other components gained remarkable speed improvements while hard disks lagged behind. Before “NCQ” was even coined, operating systems learned to optimize requests before sending them to the IDE’s FIFO buffers. That’s one of the ways that modern operating systems manage disk access better than those that existed at the dawn of defragmentation, but it’s certainly not alone.
This Isn’t Your Big Brother’s File System
The venerated FAT file system did its duty and did it well. But, the nature of disk storage changed dramatically, which is why we’ve mostly stopped using FAT. Now we have NTFS, and even that is becoming stale. Two things that it does a bit better than FAT is metadata placement and file allocation. Linux admins will be quick to point out that virtually all of their file systems are markedly better at preventing fragmentation than NTFS. However, most of the tribal knowledge around fragmentation on the Windows platform sources from the FAT days, and NTFS is certainly better than FAT.
Some of Us Keep Up with Technology
It was while I owned that gigantic, slow hard drive that the fear of fragmentation wormed its way into my mind. I saw some very convincing charts and graphs and read a very good spiel and I deeply absorbed every single word and took the entire message to heart. That was also the same period of my life in which I declined free front-row tickets to Collective Soul to avoid rescheduling a first date with a girl with whom I knew I had no future. It’s safe to say that my judgment was not sound during those days.
Over the years, I became a bit wiser. I looked back and realized some of the mistakes that I’d made. In this particular case, I slowly came to understand that everything that convinced me to defragment was marketing material from a company that sold defragmentation software. I also forced myself to admit that I never could detect any post-defragmentation performance improvements. I had allowed the propaganda to sucker me into climbing onto a bandwagon carrying a lot of other suckers, and we reinforced each others’ delusions.
That said, we were mostly talking about single-drive systems in personal computers. That transitions right into the real problem with the fragmentation discussion.
Server Systems are not Desktop Systems
I was fortunate enough that my career did not immediately shift directly from desktop support into server support. I worked through a gradual transition period. I also enjoyed the convenience of working with top-tier server administrators. I learned quickly, and thoroughly, that desktop systems and server systems are radically different.
You rely on your desktop or laptop computer for multiple tasks. You operate e-mail, web browsing, word processing, spreadsheet, instant messaging, and music software on a daily basis. If you’re a gamer, you’ve got that as well. Most of these applications use small amounts of data frequently and haphazardly; some use large amounts of data, also frequently and haphazardly. The ratio of write operations to read operations is very high, with writes commonly outnumbering reads.
Servers are different. Well-architected servers in an organization with sufficient budget will run only one application or application suite. If they use much data, they’ll rely on a database. In almost all cases, server systems perform substantially more read operations than write operations.
The end result is that server systems almost universally have more predictable disk I/O demands and noticeably higher cache hits than desktop systems. Under equal fragmentation levels, they’ll fare better.
Whether or not you’d say that server-class systems contain “better” hardware than desktop system is a matter of perspective. Server systems usually provide minimal video capabilities and their CPUs have gigantic caches but are otherwise unremarkable. That only makes sense; playing the newest Resident Evil at highest settings with a smooth frame rate requires substantially more resources than a domain controller for 5,000 users. Despite what many lay people have come to believe, server systems typically don’t work very hard. We build them for reliability, not speed.
Where servers have an edge is storage. SCSI has a solid record as the premier choice for server-class systems. For many years, it was much more reliable, although the differences are negligible today. One advantage that SCSI drives maintain over their less expensive cousins is higher rotational speeds. Of all the improvements that I mentioned above, the most meaningful advance in IDE drives was the increase of rotational speed from 3,600 RPM to 7,200 RPM. That’s a 100% gain. SCSI drives ship with 10,000 RPM motors (~38% faster than 7,200 RPM) and 15,000 RPM motors (108% faster than 7,200 RPM!).
Spindle speed doesn’t address the reliability issue, though. Hard drives need many components, and a lot of them move. Mechanical failure due to defect or wear is a matter of “when”, not “if”. Furthermore, they are susceptible to things that other component designers don’t even think about. If you get very close to a hard drive and shout at it while it’s powered, you can cause data loss. Conversely, my solid-state phone doesn’t seem to suffer nearly as much as I do even after the tenth attempt to get “OKAY GOOGLE!!!” to work as advertised.
Due to the fragility of spinning disks, almost all server systems architects design them to use multiple drives in a redundant configuration (lovingly known as RAID). The side effect of using multiple disks like this is a speed boost. We’re not going to talk about different RAID types because that’s not important here. The real point is that in practically all cases, a RAID configuration is faster than a single disk configuration. The more unique spindles in an array, the higher its speed.
With SCSI and RAID, it’s trivial to achieve speeds that are many multipliers faster than a single disk system. If we assume that fragmentation has ill effects and that defragmentation has positive effects, they are mitigated by the inherent speed boosts of this topology.
These Differences are Meaningful
When I began taking classes to train desktop support staff to become server support staff, I managed to avoid asking any overly stupid questions. My classmates weren’t so lucky. One asked about defragmentation jobs on server systems. The echoes of laughter were still reverberating through the building when the instructor finally caught his breath enough to choke out, “We don’t defragment server systems.” The student was mortified into silence, of course. Fortunately, there were enough shared sheepish looks that the instructor felt compelled to explain it. That was in the late ’90s, so the explanation was a bit different then, but it still boiled down to differences in usage and technology.
With today’s technology, we should be even less fearful of fragmentation in the datacenter, but, my observations seem to indicate that the reverse has happened. My guess is that training isn’t what it used to be and we simply have too many server administrators that were promoted off of the retail floor or the end-user help desk a bit too quickly. This is important to understand, though. Edge cases aside, fragmentation is of no concern for a properly architected server-class system. If you are using disks of an appropriate speed in a RAID array of an appropriate size, you will never realize meaningful performance improvements from a defragmentation cycle. If you are experiencing issues that you believe are due to fragmentation, expanding your array by one member (or two for RAID-10) will return substantially greater yields than the most optimized disk layout.
Disk Fragmentation and Hyper-V
To conceptualize the effect of fragmentation on Hyper-V, just think about the effect of fragmentation in general. When you think of disk access on a fragmented volume, you’ve probably got something like this in mind:
Look about right? Maybe a bit more complicated than that, but something along those lines, yes?
Now, imagine a Hyper-V system. It’s got, say, three virtual machines with their VHDX files in the same location. They’re all in the fixed format and the whole volume is nicely defragmented and pristine. As the virtual machines are running, what does their disk access look like to you. Is it like this?:
If you’re surprised that the pictures are the same, then I don’t think that you understand virtualization. All VMs require I/O and they all require their I/O more or less concurrently with I/O needs of other VMs. In the first picture, access had to skip a few blocks because of fragmentation. In the second picture, access had to skip a few blocks because it was another VM’s turn. I/O will always be a jumbled mess in a shared-storage virtualization world. There are mitigation strategies, but defragmentation is the most useless.
For fragmentation to be a problem, it must interrupt what would have otherwise been a smooth read or write operation. In other words, fragmentation is most harmful on systems that commonly perform long sequential reads and/or writes. A typical Hyper-V system hosting server guests is unlikely to perform meaningful quantities of long sequential reads and/or writes.
Disk Fragmentation and Dynamically-Expanding VHDX
Fragmentation is the most egregious of the copious, terrible excuses that people give for not using dynamically-expanding VHDX. If you listen to them, they’ll paint a beautiful word picture that will have you daydreaming that all the bits of your VHDX files are scattered across your LUNs like a bag of Trail Mix. I just want to ask anyone who tells those stories: “Do you own a computer? Have you ever seen a computer? Do you know how computers store data on disks? What about Hyper-V, do you have any idea how that works?” I’m thinking that there’s something lacking on at least one of those two fronts.
The notion fronted by the scare message is that your virtual machines are just going to drop a few bits here and there until your storage looks like a finely sifted hodge-podge of multicolored powders. The truth is that your virtual machines are going to allocate a great many blocks in one shot, maybe again at a later point in time, but will soon reach a sort of equilibrium. An example VM that uses a dynamically-expanding disk:
- You create a new application server from an empty Windows Server template. Hyper-V writes that new VHDX copy as contiguously as the storage system can allow
- You install the primary application. This causes Hyper-V to request many new blocks all at once. A large singular allocation results in the most contiguous usage possible
- The primary application goes into production.
- If it’s the sort of app that works with big gobs of data at a time, then Hyper-V writes big gobs, which are more or less contiguous.
- If it’s the sort of app that works with little bits of data at a time, then fragmentation won’t matter much anyway
- Normal activities cause a natural ebb and flow of the VM’s data usage (ex: downloading and deleting Windows Update files). A VM will re-use previously used blocks because that’s what computers do.
How to Address Fragmentation in Hyper-V
I am opposed to ever taking any serious steps to defragmenting a server system. It’s just a waste of time and causes a great deal of age-advancing disk thrashing. If you’re really concerned about disk performance, these are the best choices:
- Add spindles to your storage array
- Use faster disks
- Use a faster array type
- Don’t virtualize
If you have read all of this and done all of these things and you are still panicked about fragmentation, then there is still something that you can do. Get an empty LUN or other storage space that can hold your virtual machines. Use Storage Live Migration to move all of them there. Then, use Storage Live Migration to move them all back, one at a time. It will line them all up neatly end-to-end. If you want, copy in some “buffer” files in between each one and delete them once all VMs are in place. These directions come with a warning: you will never recover the time necessary to perform that operation.
When using down-level Windows systems as Hyper-V guests, you might notice some “Unknown Devices”. Since Device Manager was first introduced, the best way to deal with this has been to first determine if anything isn’t working as expected and only worry about the unidentified items in that case. Of course, even if there’s nothing to worry about, you’d probably like to know what they are.
If the Windows guest operating system doesn’t match the host’s version, then you should not be surprised at having some unknown entries in Device Manager. However, you should always begin by verifying that the Integration Services are up to date.
Once you’re certain that the Integration Services are up to date, you can set about verifying that there’s nothing wrong with what’s left. Be aware that the information in this article was written using a Windows Server 2008 R2 guest. The exact steps may vary for other operating systems. Also, this article was written for 2012 R2 as the host. Other versions will have different characteristics.
In Device Manager on the affected guest, there will be an Other Devices section with two items marked as Unknown Device.
Other Devices Section
If there are any more, then either the Integration Services aren’t fully installed or you’re connected in over RDP and a device is being mapped through that session that the guest does not recognize.
To determine what device you’re actually looking at, double-click it or right-click on it and click Properties to open the the device’s property sheet. Switch to the Details tab and make sure that the drop-down box is set to Hardware IDs.
The first device displays the following IDs:
You can cross-reference this against the devices on a 2012 R2 system. A little research reveals:
Remote Desktop Control Channel
These IDs belong to a device called the “Microsoft Hyper-V Remote Desktop Control Channel”. This is the driver set that enables the new Enhanced Session mode. It only works for Windows 8.1 and Windows Server 2012 R2 guests and later. This is because there is no driver that has been built for earlier versions (and possibly other reasons).
The second device shows these IDs:
This one turns out to belong to the “Microsoft Hyper-V Activation Component”. This one will be enabled on both Windows 8.1 and Windows Server 2012 R2 and later, but it will only function on the server platform. This is the service that interacts with the host when you use one of the Automatic Virtual Machine Activation keys. If the host is running Windows Server 2012 R2 Datacenter, then the combination of that key, the virtual software device, and the host operating system are what enable that functionality. Keys only exist for the Windows Server platform because the desktop operating systems are not covered by the guest virtualization rights of Datacenter Edition. Older versions don’t work partly because no driver exists for them to perform the same interaction and partly because there are no AVMA keys available for them.
Of course, I don’t actually know if back-porting these features to earlier operating systems would just be a matter of Microsoft building drivers for these devices. What is certain is that these devices are presented by the host operating system to virtual machines. As with devices on any physical system, they will be detected by any plug-and-play compliant operating system. There is no way to stop them from showing up on older operating systems that are guests on a 2012 R2 host, but there is also no harm in their presence. Just ignore them and everything will be fine.