Fragmentation is the most crippling problem in computing, wouldn’t you agree? I mean, that’s what the strange guy downtown paints on his walking billboard, so it must be true, right? And fragmentation is at least five or six or a hundred times worse for a VHDX file, isn’t it? All the experts are saying so, according to my psychic.
But, when I think about it, my psychic also told me that I’d end up rich with a full head of hair. And, I watched that downtown guy lose a bet to a fire hydrant. Maybe those two aren’t the best authorities on the subject. Likewise, most of the people that go on and on about fragmentation can’t demonstrate anything concrete that would qualify them as storage experts. In fact, they sound a lot like that guy that saw your employee badge in the restaurant line and ruined your lunch break by trying to impress you with all of his anecdotes proving that he “knows something about computers” in the hopes that you’d put in a good word for him with your HR department (and that they have a more generous attitude than his previous employers on the definition of “reasonable hygiene practices”).
To help prevent you from ever sounding like that guy, we’re going to take a solid look at the “problem” of fragmentation.
Where Did All of this Talk About Fragmentation Originate?
Before I get very far into this, let me point out that all of this jabber about fragmentation is utter nonsense. Most people that are afraid of it don’t know any better. The people that are trying to scare you with it either don’t know what they’re talking about or are trying to sell you something. If you’re about to go to the comments section with some story about that one time that a system was running slowly but you set everything to rights with a defrag, save it. I once bounced a quarter across a twelve foot oak table, off a guy’s forehead, and into a shot glass. Our anecdotes are equally meaningless, but at least mine is interesting and I can produce witnesses.
The point is, the “problem” of fragmentation is mostly a myth. Like most myths, it does have some roots in truth. To understand the myth, you must know its origins.
These Aren’t Your Uncle’s Hard Disks
In the dark ages of computing, hard disks were much different from the devices that you know and love today. I’m young enough that I missed the very early years, but the first one owned by my family consumed the entire top of a desktop computer chassis. I was initially thrilled when my father presented me with my very own PC as a high school graduation present. I quickly discovered that it was a ploy to keep me at home a little longer because it would be quite some time before I could afford an apartment large enough to hold its hard drive. You might be thinking, “So what, they were physically bigger. I have a dozen magazines claiming that size doesn’t matter!” Well, those articles weren’t written about computer hard drives, were they? In hard drives, physical characteristics matter.
Old Drives Were Physically Larger
The first issue is diameter. Or, more truthfully, radius. You see, there’s a little arm inside that hard drive whose job it is to move back and forth from the inside edge to the outside edge of the platter and back, picking up and putting down bits along the way. That requires time. The further the distance, the more time required. Even if we pretend that actuator motors haven’t improved at all, less time is required to travel a shorter distance. I don’t know actual measurements, but it’s a fair guess that those old disks had over a 2.5-inch radius, whereas modern 3.5″ disks are closer to a 1.5″ radius and 2.5″ disks something around a 1″ radius. It doesn’t sound like much until you compare them by percentage differences. Modern enterprise-class hard disks have less than half the maximum read/write head travel distance of those old units.
It’s not just the radius. The hard disk that I had wasn’t only wide, it was also tall. That’s because it had more platters in it than modern drives. That’s important because, whereas each platter has its own set of read/write heads, a single motor controls all of the arms. Each additional platter increases the likelihood that the read/write head arm will need to move a meaningful distance to find data between any two read/write operations. That adds time.
Old Drives Were Physically Slower
After size, there’s rotational speed. The read/write heads follow a line from the center of the platter out to the edge of the platter, but that’s their only range of motion. If a head isn’t above the data that it wants, then it must hang around and wait for that data to show up. Today, we think of 5,400 RPM drives as “slow”. That drive of mine was moping along at a meagerly 3,600 RPM. That meant even more time was required to get/set data.
There were other factors that impacted speed as well, although none quite so strongly as rotational speed improvements. The point is, physical characteristics in old drives meant that they pushed and pulled data much more slowly than modern drives.
Old Drives Were Dumb
Up until the mid-2000s, every drive in (almost) every desktop computer used a PATA IDE or EIDE interface (distinction is not important for this discussion). A hard drive’s interface is the bit that sits between the connecting cable bits and the spinning disk/flying head bits. It’s the electronic brain that figures out where to put data and where to go get data. IDE brains are dumb (another word for “cheap”). They operate on a FIFO (first-in first-out) basis. This is an acronym that everyone knows but almost no one takes a moment to think about. For hard drives, it means that each command is processed in exactly the order in which it was received. Let’s say that it gets the following:
- Read data from track 1
- Write data to track 68,022
- Read data from track 2
An IDE drive will perform those operations in exactly that order, even though it doesn’t make any sense. If you ever wondered why SCSI drives were so much more expensive than IDE drives, that was part of the reason. SCSI drives were a lot smarter. They would receive a list of demands from the host computer, plot the optimal course to satisfy those requests, and execute them in a logical fashion.
In the mid-2000s, we started getting new technology. AHCI and SATA emerged from the primordial acronym soup as Promethean saviors, bringing NCQ (native command queuing) to the lowly IDE interface. For the first time, IDE drives began to behave like SCSI drives. … OK, that’s overselling NCQ. A lot. It did help, but not as much as it might have because…
Operating Systems Take More Responsibility
It wasn’t just hard drives that operated in FIFO. Operating systems started it. They had good excuses, though. Hard drives were slow, but so were all of the other components. A child could conceive of better access techniques than FIFO, but even PhDs struggled against the CPU and memory requirements to implement them. Time changed all of that. Those other components gained remarkable speed improvements while hard disks lagged behind. Before “NCQ” was even coined, operating systems learned to optimize requests before sending them to the IDE’s FIFO buffers. That’s one of the ways that modern operating systems manage disk access better than those that existed at the dawn of defragmentation, but it’s certainly not alone.
This Isn’t Your Big Brother’s File System
The venerated FAT file system did its duty and did it well. But, the nature of disk storage changed dramatically, which is why we’ve mostly stopped using FAT. Now we have NTFS, and even that is becoming stale. Two things that it does a bit better than FAT is metadata placement and file allocation. Linux admins will be quick to point out that virtually all of their file systems are markedly better at preventing fragmentation than NTFS. However, most of the tribal knowledge around fragmentation on the Windows platform sources from the FAT days, and NTFS is certainly better than FAT.
Some of Us Keep Up with Technology
It was while I owned that gigantic, slow hard drive that the fear of fragmentation wormed its way into my mind. I saw some very convincing charts and graphs and read a very good spiel and I deeply absorbed every single word and took the entire message to heart. That was also the same period of my life in which I declined free front-row tickets to Collective Soul to avoid rescheduling a first date with a girl with whom I knew I had no future. It’s safe to say that my judgment was not sound during those days.
Over the years, I became a bit wiser. I looked back and realized some of the mistakes that I’d made. In this particular case, I slowly came to understand that everything that convinced me to defragment was marketing material from a company that sold defragmentation software. I also forced myself to admit that I never could detect any post-defragmentation performance improvements. I had allowed the propaganda to sucker me into climbing onto a bandwagon carrying a lot of other suckers, and we reinforced each others’ delusions.
That said, we were mostly talking about single-drive systems in personal computers. That transitions right into the real problem with the fragmentation discussion.
Server Systems are not Desktop Systems
I was fortunate enough that my career did not immediately shift directly from desktop support into server support. I worked through a gradual transition period. I also enjoyed the convenience of working with top-tier server administrators. I learned quickly, and thoroughly, that desktop systems and server systems are radically different.
You rely on your desktop or laptop computer for multiple tasks. You operate e-mail, web browsing, word processing, spreadsheet, instant messaging, and music software on a daily basis. If you’re a gamer, you’ve got that as well. Most of these applications use small amounts of data frequently and haphazardly; some use large amounts of data, also frequently and haphazardly. The ratio of write operations to read operations is very high, with writes commonly outnumbering reads.
Servers are different. Well-architected servers in an organization with sufficient budget will run only one application or application suite. If they use much data, they’ll rely on a database. In almost all cases, server systems perform substantially more read operations than write operations.
The end result is that server systems almost universally have more predictable disk I/O demands and noticeably higher cache hits than desktop systems. Under equal fragmentation levels, they’ll fare better.
Whether or not you’d say that server-class systems contain “better” hardware than desktop system is a matter of perspective. Server systems usually provide minimal video capabilities and their CPUs have gigantic caches but are otherwise unremarkable. That only makes sense; playing the newest Resident Evil at highest settings with a smooth frame rate requires substantially more resources than a domain controller for 5,000 users. Despite what many lay people have come to believe, server systems typically don’t work very hard. We build them for reliability, not speed.
Where servers have an edge is storage. SCSI has a solid record as the premier choice for server-class systems. For many years, it was much more reliable, although the differences are negligible today. One advantage that SCSI drives maintain over their less expensive cousins is higher rotational speeds. Of all the improvements that I mentioned above, the most meaningful advance in IDE drives was the increase of rotational speed from 3,600 RPM to 7,200 RPM. That’s a 100% gain. SCSI drives ship with 10,000 RPM motors (~38% faster than 7,200 RPM) and 15,000 RPM motors (108% faster than 7,200 RPM!).
Spindle speed doesn’t address the reliability issue, though. Hard drives need many components, and a lot of them move. Mechanical failure due to defect or wear is a matter of “when”, not “if”. Furthermore, they are susceptible to things that other component designers don’t even think about. If you get very close to a hard drive and shout at it while it’s powered, you can cause data loss. Conversely, my solid-state phone doesn’t seem to suffer nearly as much as I do even after the tenth attempt to get “OKAY GOOGLE!!!” to work as advertised.
Due to the fragility of spinning disks, almost all server systems architects design them to use multiple drives in a redundant configuration (lovingly known as RAID). The side effect of using multiple disks like this is a speed boost. We’re not going to talk about different RAID types because that’s not important here. The real point is that in practically all cases, a RAID configuration is faster than a single disk configuration. The more unique spindles in an array, the higher its speed.
With SCSI and RAID, it’s trivial to achieve speeds that are many multipliers faster than a single disk system. If we assume that fragmentation has ill effects and that defragmentation has positive effects, they are mitigated by the inherent speed boosts of this topology.
These Differences are Meaningful
When I began taking classes to train desktop support staff to become server support staff, I managed to avoid asking any overly stupid questions. My classmates weren’t so lucky. One asked about defragmentation jobs on server systems. The echoes of laughter were still reverberating through the building when the instructor finally caught his breath enough to choke out, “We don’t defragment server systems.” The student was mortified into silence, of course. Fortunately, there were enough shared sheepish looks that the instructor felt compelled to explain it. That was in the late ’90s, so the explanation was a bit different then, but it still boiled down to differences in usage and technology.
With today’s technology, we should be even less fearful of fragmentation in the datacenter, but, my observations seem to indicate that the reverse has happened. My guess is that training isn’t what it used to be and we simply have too many server administrators that were promoted off of the retail floor or the end-user help desk a bit too quickly. This is important to understand, though. Edge cases aside, fragmentation is of no concern for a properly architected server-class system. If you are using disks of an appropriate speed in a RAID array of an appropriate size, you will never realize meaningful performance improvements from a defragmentation cycle. If you are experiencing issues that you believe are due to fragmentation, expanding your array by one member (or two for RAID-10) will return substantially greater yields than the most optimized disk layout.
Disk Fragmentation and Hyper-V
To conceptualize the effect of fragmentation on Hyper-V, just think about the effect of fragmentation in general. When you think of disk access on a fragmented volume, you’ve probably got something like this in mind:
Look about right? Maybe a bit more complicated than that, but something along those lines, yes?
Now, imagine a Hyper-V system. It’s got, say, three virtual machines with their VHDX files in the same location. They’re all in the fixed format and the whole volume is nicely defragmented and pristine. As the virtual machines are running, what does their disk access look like to you. Is it like this?:
If you’re surprised that the pictures are the same, then I don’t think that you understand virtualization. All VMs require I/O and they all require their I/O more or less concurrently with I/O needs of other VMs. In the first picture, access had to skip a few blocks because of fragmentation. In the second picture, access had to skip a few blocks because it was another VM’s turn. I/O will always be a jumbled mess in a shared-storage virtualization world. There are mitigation strategies, but defragmentation is the most useless.
For fragmentation to be a problem, it must interrupt what would have otherwise been a smooth read or write operation. In other words, fragmentation is most harmful on systems that commonly perform long sequential reads and/or writes. A typical Hyper-V system hosting server guests is unlikely to perform meaningful quantities of long sequential reads and/or writes.
Disk Fragmentation and Dynamically-Expanding VHDX
Fragmentation is the most egregious of the copious, terrible excuses that people give for not using dynamically-expanding VHDX. If you listen to them, they’ll paint a beautiful word picture that will have you daydreaming that all the bits of your VHDX files are scattered across your LUNs like a bag of Trail Mix. I just want to ask anyone who tells those stories: “Do you own a computer? Have you ever seen a computer? Do you know how computers store data on disks? What about Hyper-V, do you have any idea how that works?” I’m thinking that there’s something lacking on at least one of those two fronts.
The notion fronted by the scare message is that your virtual machines are just going to drop a few bits here and there until your storage looks like a finely sifted hodge-podge of multicolored powders. The truth is that your virtual machines are going to allocate a great many blocks in one shot, maybe again at a later point in time, but will soon reach a sort of equilibrium. An example VM that uses a dynamically-expanding disk:
- You create a new application server from an empty Windows Server template. Hyper-V writes that new VHDX copy as contiguously as the storage system can allow
- You install the primary application. This causes Hyper-V to request many new blocks all at once. A large singular allocation results in the most contiguous usage possible
- The primary application goes into production.
- If it’s the sort of app that works with big gobs of data at a time, then Hyper-V writes big gobs, which are more or less contiguous.
- If it’s the sort of app that works with little bits of data at a time, then fragmentation won’t matter much anyway
- Normal activities cause a natural ebb and flow of the VM’s data usage (ex: downloading and deleting Windows Update files). A VM will re-use previously used blocks because that’s what computers do.
How to Address Fragmentation in Hyper-V
I am opposed to ever taking any serious steps to defragmenting a server system. It’s just a waste of time and causes a great deal of age-advancing disk thrashing. If you’re really concerned about disk performance, these are the best choices:
- Add spindles to your storage array
- Use faster disks
- Use a faster array type
- Don’t virtualize
If you have read all of this and done all of these things and you are still panicked about fragmentation, then there is still something that you can do. Get an empty LUN or other storage space that can hold your virtual machines. Use Storage Live Migration to move all of them there. Then, use Storage Live Migration to move them all back, one at a time. It will line them all up neatly end-to-end. If you want, copy in some “buffer” files in between each one and delete them once all VMs are in place. These directions come with a warning: you will never recover the time necessary to perform that operation.