How to Compact a VHDX with a Linux Filesystem

How to Compact a VHDX with a Linux Filesystem

 

Microsoft’s compact tool for VHD/X works by deleting empty blocks. “Empty” doesn’t always mean what you might think, though. When you delete a file, almost every file system simply removes its entry from the allocation table. That means that those blocks still contain data; the system simply removes all indexing and ownership. So, those blocks are not empty. They are unused. When a VHDX contains file systems that the VHDX driver recognizes, it can work intelligently with the contained allocation table to remove unused blocks, even if they still contain data. When a VHDX contains file systems commonly found on Linux (such as the various iterations of ext), the system needs some help.

Making Some Space

Before we start, a warning: don’t even bother with this unless you can reclaim a lot of space. There is no value in compacting a VHDX just because it exists. In my case, I had something go awry in my system that caused the initramfs system to write gigabytes of data to its temporary folder. My VHDX that ordinarily used around 5 GB ballooned to 50GB in a short period of time.

Begin by getting your bearings. df can show you how much space is in use. I neglected to get a screen shot prior to writing this article, but this is what I have now:

cpctlvhd_baseline

At this time, I’m sitting at a healthy 5% usage. When I began, I had 80% usage.

Clean up as much as you can. Use apt autoremove, apt autoclean, and apt clean on systems that use apt. Use yum clean all on yum systems. Check your /var/tmp folder. If you’re not sure what’s consuming all of your data, du can help. To keep it manageable, target specific folders. You can save the results to a file like this:

You can then open the /home/<your account>/var-temp-du file using WinSCP. It’s a tab-delimited file, so you can manipulate it easily. Paste into Excel, and you can sort by size.

More user-friendly downloadable tools exist. I tried gt5 with some luck.

As I mentioned before, I had gigabytes of files in /var/tmp created by initramfs. I’m not sure what it used to create the names, but they all started with “initramfs”. So, I removed them that way: rm /var/tmp/initramfs* -r. That alone brought me down to the lovely number that you see above. However, as you’re well aware, the VHDX remains at its expanded size.

Don’t forget to df after cleanup! If the usage hasn’t changed much, then I’d stop here and either find something else to delete or find something else to do altogether.

Zeroing a VHDX with an ext Filesystem

I assume that this process will work with any file system at all, but I’ve only tested with ext4. Your mileage may vary.

Because the VHDX cannot parse the file system, it can only remove blocks that contain all zeros. With that knowledge, we now have a goal: zero out unused blocks. We’ll need to do that from within the guest.

Preferred Method: fstrim

My personal favorite method for handling this is the “fstrim” utility. Reasons:

  • fstrim works very quickly
  • fstrim doesn’t cause unnecessary wear on SSDs but still works on spinning rust
  • fstrim ships in the default tool set of most distributions
  • fstrim is ridiculously simple to use

Usage:

On my system that had recently shed over 70 GB of fat, fstrim completed in about 5 seconds.

Note: according to some notes that I found for Ubuntu, it automatically performs an fstrim periodically. I assume that you’re here because you want this done now, so this information probably serves mostly as FYI.

Alternative Zeroing Methods

If fstrim doesn’t work for you, then we need to look at tools designed to write zeros to unused blocks.

I would caution you away from using security tools.  They commonly make multiple passes of non-zero writes for security purposes on magnetic media. That’s because an analog reader can detect charge levels that are too low to register as a “1” on your drive’s internal digital head. They can interpret them as earlier write operations. After three forced writes to the same location, even analog equipment won’t read anything. On an SSD, though, those writes will mostly reduce its lifespan. Also, non-zero writes are utterly pointless for what we’re doing. Some security tools will write all zeros. That’s better, but they also make multiple passes. We only need one.

Create a File from /dev/zero

Linux includes a nifty built-in tool that just generates zeroes until you stop asking. You can leverage it by “reading” from it and outputting to a file that you create just for this purpose.

On a physical system, this operation would always take a very long time because it literally writes zeros to every unused block in the file system. Hyper-V will realize that the bits being written are zeroes. So, when it hits a block that hasn’t already been expanded, it will just ignore the write. However, the blocks that do contain data will be zeroed, so this can still take some time. So, it’s not nearly as fast as fstrim, but it’s also not going to make the VHDX grow any larger than it already is.

zerofree

The “zerofree” package can be installed with your package manager from the default repository (on most distributions). It has major issues that might be show-stoppers:

  • I couldn’t find any way to make it work with LVM volumes. I found some people that did, but their directions didn’t work for me. That might be because of my disk system, because…
  • It’s not recommend for ext4 or xfs file systems. If your Linux system began life as a recent version, you’re probably using ext4 or xfs.
  • Zerofree can’t work with mounted file systems. That means that it can’t work with your active primary file system.
  • You’ll need to detach it and attach it to another Linux guest. You could also use something like a bootable recovery disk that has zerofree.

If you mount it in a foreign system, run sudo lsblk -f to locate the attached disk and file systems:

Verify that the target volume/file system does not appear in df. If it shows up in that list, you’ll need to unmount it before you can work with it.

I’ve highlighted the only volume on my added disk that is safe to work with. It’s a tiny system volume in my case so zeroing it probably won’t do a single thing for me. I’m showing you this in the event that you have an ext2 or ext3 file system in one of your own Linux guests with a meaningful amount of space to free. Once you’ve located the correct partition whose free space you wish to clear:

Search!

In my research for this article, I found a number of search hits that looked somewhat promising. If nothing here works for you, look for other ways. Remember that your goal is to zero out the unused space in your Linux file system.

Compact the VHDX

The compact process itself does not differ, regardless of the contained file system. If you already know how to compact a dynamically-expanding VHDX, you’ll learn nothing else from me here.

As with the file delete process, I always recommend that you look at the VHDX in Explorer or the directory listing of a command/PowerShell prompt so that you have a “before” idea of the file.

Use PowerShell to Compact a Dynamically-Expanding VHDX

The owning virtual machine must be Off or Saved. Do not compact a VHDX that is a parent of a differencing disk. It might work, but really, it’s not worth taking any risks.

Use the Optimize-VHD cmdlet to compact a VHDX:

The help for that cmdlet indicates that -Mode Fullscans for zero blocks and reclaims unused blocks”. However, it then goes on to say that the VHDX must be mounted in read-only mode for that to work. The wording is unclear and can lead to confusion. The zero block scan should always work. The unused block part requires the host to be able to read the contained file system — that’s why it needs to be mounted. The contained file system must also be NTFS for that to work at all. All of that only applies to blocks that are unused but not zeroed. The above exercise zeroed those unused blocks. So, this will work for Linux file systems without mounting.

Use Hyper-V Manager to Compact a Dynamically-Expanding VHDX

Hyper-V Manager connects you to a VHDX tool to provide “editing” capabilities. The options for “editing” includes compacting. It can work for VHDX’s that are attached to a VM or are sitting idle.

Start the Edit Wizard on a VM-Attached VHDX

The virtual machine must be Off or Saved. If the virtual machine has checkpoints, you will be compacting the active VHDX.

Open the property sheet for the virtual machine. On the left, highlight the disk to compact. On the right, click the Edit button.

cpctlvhd_vmdiskselect

Jump past the next sub-section to continue.

Start the Edit Wizard on a Detached VHDX

The VHDX compact tool that Hyper-V Manager uses relies on a Hyper-V host. If you’re using Hyper-V Manager from a remote system, that means something special to you. You must first select the Hyper-V host that will be performing the compact, then select the VHDX that you want that host to compact.

Select the host first:

cpctlvhd_hostselectNow, you can either right-click on that host and click Edit Disk or you can use the Edit Disk link in the far right Actions pane; they both go to the same wizard.

cpctlvhd_editdisk

The first screen of the wizard is informational. Click Next on that. After that, you’ll be at the first actionable page. Read on in the next sub-section.

Using the Edit Disk Wizard to Compact a VHDX

Both of the above processes will leave you on the Locate Disk page. The difference is that if you started from a virtual machine’s property sheet, the disk selector will be grayed out. For a standalone disk, enter or browse to the target VHDX. Remember that the dialog and tool operate from the perspective of the host. If you connected Hyper-V Manager to a remote host, there may be delegation issues on SMB-hosted systems.

cpctlvhd_locatedisk

On the next screen, choose Compact:

cpctlvhd_compactoption

The final page allows you to review and cancel if desired. Click Finish to start the process:

cpctlvhd_wizfinish

Depending on how much work it has to do, this could be a quick or slow process. Once it’s completed, it will simply return to the last thing you were doing. If you started from a virtual machine, you’ll return to its property sheet. Otherwise, you’ll simply return to Hyper-V Manager.

Check the Outcome

Locate your VHDX in Explorer or a directory listing to ensure that it shrank. My disk has returned to its happy 5GB size:

cpctlvhd_results

 

Hyper-V Differencing Disks Explained

Hyper-V Differencing Disks Explained

 

Usually when we talk about Hyper-V’s virtual disk types, we focus on fixed and dynamically expanding. There’s another type that enjoys significantly less press: differencing disks. Administrators don’t deal directly with differencing disks as often as they work with the other two types, but they are hardly rare. Your Hyper-V knowledge cannot be complete without an understanding of the form and function of differencing disks, so let’s take a look.

What are Hyper-V Differencing Disks?

A differencing disk contains block data that represents changes to a parent virtual hard disk. The salient properties of differencing disks are:

  • A differencing disk must have exactly one parent. No more, no less.
  • The parent of a differencing disk must be another virtual hard disk. You cannot attach them to pass-through disks, a file system, a LUN, a remote share, or anything else.
  • The parent of a differencing disk can be any of the three types (fixed, dynamically expanding, or differencing)
  • Any modification to the data of the parent of a differencing disk effectively orphans the differencing disk, rendering it useless
  • Hyper-V can merge the change data back into the parent, destroying the differencing disk in the process. For Hyper-V versions past 2008 R2, this operation can take place while the disk is in use

Typically, differencing disks are small. They can grow, however. They can grow to be quite large. The maximum size of a differencing disk is equal to the maximum size of the root parent. I say “root” because, even though a differencing disk can be the parent of another differencing disk, there must be a non-differencing disk at the very top for any of them to be useful. Be aware that a differencing disk attached to a dynamically expanding disk does have the potential to outgrow its parent, if that disk isn’t fully expanded.

How Do Differencing Disks Work?

The concept behind the functioning of a differencing disk is very simple. When Hyper-V needs to write to a virtual disk that has a differencing child, the virtual disk driver redirects the write into a differencing disk. It tracks which block(s) in the original file were targeted and what their new contents would have been.

Differencing Disk Write

Differencing Disk Write

 

The most important thing to understand is that the virtual disk driver makes a choice to write to the differencing disk. The file itself is not marked read-only. You cannot scan the file and discover that it has a child. The child knows who its parent is, but that knowledge is not reciprocated.

Writes are the hard part to understand. If you’ve got that down, then reads are easy to understand. When the virtual machine requests data from its disk, the virtual disk driver first checks to see if the child has a record of the requested block(s). If it does, then the child provides the data for the read. If the child does not have a record of any changes to the block(s), the virtual disk driver retrieves them from the parent.

This is a Hyper-V blog, so I mostly only talk about Hyper-V. However, the virtual disk driver is part of the Windows operating system. The normal tools that you have access to in Windows without Hyper-V cannot create a differencing disk, but you can mount one as long as its parent is present.

How are Differencing Disks Created?

Unlike fixed and dynamically expanding virtual hard disks, you don’t simply kick off a wizard and create a differencing disk from scratch. In fact, most Hyper-V administrators will never directly create a differencing disk at all. There are four generic methods by which differencing disks are created.

Backup Software

For most of us, backup software is the most likely source of differencing disks. When a Hyper-V aware backup application targets a virtual machine, Hyper-V will take a special checkpoint. While the disk and the state of the virtual machine are frozen in the checkpoint, the backup application can copy the contents without fear that they’ll change. When the backup is complete, Hyper-V deletes the checkpoint and merges the differencing disk that it created back into its parent. If it doesn’t, you have a problem and will need to talk to your backup vendor.

Note: backup software operations will always create differencing disks in the same location as the parent. You cannot override this behavior!

Standard and Production Checkpoints

Standard and Production Checkpoints are created by administrators, either manually or via scripts and other automated processes. As far as the disks are concerned, there isn’t much difference between any of the checkpoint types. Unlike backup checkpoints, Hyper-V will not automatically attempt to clean up standard or production checkpoints. That’s something that an administrator must do, also manually or via scripts and other automated processes.

Note: checkpoint operations will always create differencing disks in the same location as the parent. You cannot override this behavior!

Pooled Remote Desktop Services

For the rest of this article, I’m going to pretend that this method doesn’t exist. If you’re operating a full-blown Remote Desktop Services (RDS) operation for your virtual desktop infrastructure (VDI), then it’s using differencing disks. Your gold master is the source, and all of the virtual machines that users connect to are built on differencing disks. When a user’s session ends, the differencing disk is destroyed.

Manual Creation

Of the four techniques to create a differencing virtual hard disk, manual creation is the rarest. There aren’t a great many uses for this ability, but you might need to perform an operation similar to the gold master with many variants technique employed by VDI. It is possible to create many differencing disks from a single source and connect separate virtual machines to them. It can be tough to manage, though, and there aren’t any tools to aid you.

You can create a differencing disk based on any parent virtual hard disk using PowerShell or Hyper-V Manager.

Creating a Hyper-V Differencing Virtual Hard Disk with PowerShell

The New-VHD cmdlet is the tool for this job:

Don’t forget to use tab completion, especially with ParentPath.

Creating a Hyper-V Differencing Virtual Hard Disk with Hyper-V Manager

Use the new virtual hard disk wizard in Hyper-V Manager to create a differencing disk:

  1. In Hyper-V Manager, right-click on the host to create the disk on, or use the Action pane in the far right. Click New, then Hard Disk.
    diff_startwiz
  2. Click Next on the informational screen.
  3. Choose VHD or VHDX. The differencing disk’s type must match its parent’s type. You cannot make a differencing disk of a VHDS file.
    diff_xornot
  4. Choose Differencing.
    diff_choosediff
  5. Enter the file name and path of the differencing disk that you want to create.
    diff_outpath
  6. Select the source virtual hard disk.
    diff_inpath
  7. Check your work and click Finish if you’re satisfied, or go Back and fix things.
    diff_finish

How Manual Differencing Disk Creation is Different

A differencing disk is a differencing disk; no matter how you create them, they are technologically identical. There are environmental differences, however. Keep these things in mind:

  • Hyper-V will automatically use the differencing disks created by backup, standard, and production checkpoints. It will retarget the connected virtual machine as necessary. No such automatic redirection occurs when you manually create a differencing disk. Remember that modifying a virtual hard disk that has differencing disks will render the children useless.
  • During manual creation of a differencing, you can specify a different target path for the differencing disk. While convenient, it’s tougher to identify that a virtual hard disk has children when they’re not all together.
  • Hyper-V Manager maintains a convenient tree view of standard and production checkpoints. Manually created differencing disks have no visual tools.
  • The checkpointing system will conveniently prepend an “A” (for “automatic”) to the extensions of the differencing disks it creates (and give them bizarre base file names). Both Hyper-V Manager and PowerShell will get upset if you attempt to use AVHD or AVHDX as an extension for a manually-created differencing disk. That makes sense, since “A” is for automatic, and automatic is antonym of “manual”. Unfortunately, these tools are not supportive of “MVHD” or “MHVDX” extensions, either. If you do not give it an obvious base name, you could cause yourself some trouble.

You can use PowerShell to detect the differencing disk type and its parent:

diff_psdiffdetective

The Inspect function in Hyper-V Manager does the same thing. You can find this function in the same Action menu that you used to start the disk creation wizard.

diff_hvinspect

I also wrote a PowerShell script that can plumb a VHD/X file for parent information. It’s useful when you don’t have the Hyper-V role enabled, because none of the above utilities can function without it. Head over to my orphaned Hyper-V file locator script and jump down to the Bonus Script heading. It’s a PowerShell veneer over .Net code, so it will also be of use if you’re looking to do something like that programmatically.

Merging Manually-Created Differencing Disks

Now that you know how to create differencing disks, it’s important to teach you how to merge them. Ordinarily, you’ll merge them back into their parents. You also have the option to create an entirely new disk that is a combination of the parent and child, but does not modify either. The merge process can also be done in PowerShell or Hyper-V Manager, but these tools have a different feature set.

Warning: Never use these techniques to merge a differencing disk that is part of a checkpointed VM back into its parent! Delete the checkpoint to merge the differencing disk instead. It is safe to merge a checkpointed VM’s disk into a different disk.

Merging a Hyper-V Differencing Virtual Hard Disk with PowerShell

Use the aptly-named Merge-VHD cmdlet to transfer the contents of the differencing disk into its parent:

The differencing disk is destroyed at the end of this operation.

PowerShell cannot be used to create a completely new target disk, for some reason. It does include a DestinationPath parameter, but that can only be used to skip levels in the differencing chain. For instance, let’s say that you have a root.vhdx with child diff1.vhdx that has its own child diff2.vhdx that also has its own child diff3.vhdx. You can use Merge-VHD -Path .diff3.vhdx -DestinationPath .diff1.vhdx to combine diff3.vhdx and diff2.vhdx into diff1.vhdx in a single pass. Without the DestinationPath parameter, diff3.vhdx would only merge into diff2.vhdx. You’d need to run Merge-VHD several times to merge the entire chain. Hyper-V Manager has no such capability.

Merging a Hyper-V Differencing Virtual Hard Disk with Hyper-V Manager

Hyper-V Manager has a disk editing wizard for this task.

  1. In Hyper-V Manager, right-click on the host to create the disk on, or use the Action pane in the far right. Click
    diff_starteditwiz
  2. Click Next on the informational screen.
  3. Browse to the differencing disk that you wish to merge.
    diff_finddiffdisk
  4. Choose Merge.
    diff_mergeoption
  5. Choose to merge into the parent or into a new disk.
    diff_mergetarget
  6. Check your work and click Finish if you’re satisfied, or go Back and fix things.

If you chose to merge the disk into its parent, the differencing disk is destroyed at the end of the operation. If you chose to merge into a new disk, both the source and differencing disk are left intact.

Hyper-V Manager cannot merge multiple layers of a differencing chain the way that PowerShell can.

The Dangers of Differencing Disks

There are two risks with using differencing disks: performance and space.

Differencing Disk Performance Hits

When a differencing disk is in use, Hyper-V will need to jump back and forth from the child to the parent to find the data that it wants for reads. Writes are smoother as they all go to the differencing disk. On paper, this looks like a very scary operation. In practice, you are unlikely to detect any performance problems with a single differencing child. However, if you continue chaining differencing disk upon differencing disk, there will eventually be enough extraneous read operations that you’ll start having problems.

Also, merge operations require every single bit in the differencing disk to be transferred to the parent. That operation can cause an I/O storm. The larger the differencing disk is, the greater the impact of a merge operation.

Differencing Disk Space Issues

As mentioned earlier, a differencing disk can expand to the maximum size of its parent. If you have a root disk with a maximum size of 50 gigabytes, then any and all of its differencing disks can also grow to 50 gigabytes. If the root is dynamically expanding, then it is possible for its differencing disk(s) to exceed its size. For example, in this article I have used a completely empty root VHDX. It’s 4 megabytes in size. If I were to install an operating system into the differencing disk that I created for it, root.vhdx would remain at 4 megabytes in size while its differencing disk ballooned to whatever was necessary to hold that operating system.

A merge operation might require extra space, as well. If I were to merge that differencing disk with the OS back into the empty 4 megabyte root disk, then it would need to expand the root disk to accommodate all of those changed bits. It can’t destroy the differencing disk until the merge is complete, so I’m going to need enough space to hold that differencing disk twice. Once the merge is completed, the space used by the differencing disk will be reclaimed.

If the root disk is fixed instead of dynamically expanding, then the merges will be written into space that’s already allocated. There will always be a space growth concern when merging trees, however, because differencing disks are also dynamically expanding and they merge from the bottom up.

Transplanting a Differencing Disk

Did a forgotten differencing disk run you out of space? Did a differencing disk get much larger than anticipated and now you can’t merge it back into its parent? No problem. Well… potentially no problem. All that you need is some alternate location to hold one or more of the disks. Follow these steps:

  1. Shut down any connected virtual machine. You cannot perform this operation live.
  2. Move the disk(s) in question. Which you move, and where you move them, is up to you. However, both locations must be simultaneously visible from a system that has the Merge-VHD cmdlet and the Hyper-V role installed. Remember that it is the disk that you are merging into that will grow (unless it’s fixed).
  3. If you moved the differencing disk and you’re still on the system that it came from, then you can probably just try the merge. It’s still pointed to the same source disk. Use Get-VHD or Inspect to verify.
  4. If you moved the root disk, run the following cmdlet:

    Your usage will obviously be different from mine, but what you’re attempting to do is set the ParentPath to reflect the true location of the parent’s VHDX. You can now attempt to merge the disk.

If a differencing disk somehow becomes disjoined from its parent, you can use Set-VHD to correct it. What you cannot do is use it to rejoin a differencing disk to a parent that has been changed. Even though the Set-VHD cmdlet may work, any merge operation will likely wreck the data in the root disk and render both unusable.

Understanding and Working with VHD(X) Files

Understanding and Working with VHD(X) Files

A virtual machine is an abstract computer. It mimics a physical computer for the purpose of extending the flexibility of the computer system while still providing as many features of the physical environment as possible. Hyper-V, like most hypervisors, defines this abstract computer using files. The entities that abstract a physical computer’s hard drive(s) are, therefore, mere files. Specifically for Hyper-V, these files are in the VHDX format.

What is a VHD/X file?

VHDX is an semi-open file format that describes a virtual hard disk. The x was added to the current specification’s name so that it would not be confused with the earlier VHD format. Microsoft publishes this specification freely so that others can write their own applications that manipulate VHD/VHDX files, but Microsoft maintains sole responsibility for control of the format.

A VHDX mimics a hard disk. It is not related to formats, such as NTFS or FAT or EXT3. It is also not concerned with partitions. VHDX presents the same characteristics as a physical hard drive, or SSD, or SAN LUN, or any other block storage. It is up to some other component, such as the guest operating system, to define how the blocks are used. Simply, a VHDX that contains a possible NTFS format looks like the following:

VHDX Visualization

VHDX Visualization

 

Where can VHDX Be Used?

This is a Hyper-V blog, so naturally, I will usually only bring up VHDX in that context. It is not particular to Hyper-V at all. Windows 7 and Windows Server 2008 R2 were able to directly open and manipulate VHD files. Windows 8+ and Windows Server 2012+ can natively open and manipulate VHDX files. For instance, the Disk Management tool in Windows 10 allows you to create and attach VHD and VHDX files:

Windows 10 VHD Menu

Windows 10 VHD Menu

When mounted, the VHDX then looks to the operating system like any other disk:

Mounted VHDX

Mounted VHDX

 

The most important thing to know is that whether or not you can use VHDX depends entirely upon the operating system that controls the file, not anything inside the data region of the file. The contents of the data area are an issue for the operating system that will control the file system. If it’s used by Hyper-V, then that is the guest operating system. If the VHDX is mounted in Windows 10, then Windows 10 will deal with it entirely. It will do so using two different mechanisms.

The VHDX Driver

Modern Windows operating systems, desktop, server, and Hyper-V, all include a driver for working with VHDX files. This is what that driver sees when it looks at a VHDX:

VHDX View from Hyper-V

VHDX View from Management Operating System

 

Windows 7 and Windows/Hyper-V Server 2008 R2 and earlier cannot mount a VHDX because they do not contain a VHDX driver. They can only work with the earlier VHD format. If a VHDX driver existed for those operating systems, they would theoretically be able to work with those file types. Logically, this is no different than attaching a disk via a SCSI card that the operating system may or may not recognize.

The File System Driver

The file system driver sees only this part:

VHDX View From Inside

VHDX View From Inside

 

There is no indication that the visible contents are held within a VHDX. They could just as easily be on a SAN LUN or a local SSD. For this reason, it does not matter at all to any guest operating system if you use VHDX or some other format. Linux guests can run perfectly well from their common ext3 and ext4 formats when inside a VHDX.

When a VHDX is mounted in Windows 10 or a server OS, it will require both the VHDX driver and the file system driver in order to be able to manipulate and read the contents of the file. You can mount a VHDX containing ext3 partitions inside Windows 10, but it will be unable to manipulate the contents because it doesn’t know what to do with ext3.

Should I Use VHD or VHDX?

If you will never use the virtual disk file with down-level management operating systems (Windows 7 or Windows/Hyper-V Server 2008 R2 or earlier), then you should always use VHDX (remember that guest operating systems don’t know or care which you use). Unless things have changed, Azure still can’t use a VHDX either, but you can replicate your VHDXs there. If you need to mount them on the Azure side, they will be automatically converted to VHD, although you do need to stay below the 1TB maximum size limit for Azure. If anyone has updated information that the Azure situation has changed, please let me know.

Configuration Information for VHDX Files

There are a few things to understand about VHDX files before creating them. I have explained the process for VHDX creation in another article.

Generation 1 (VHD) vs. Generation 2 (VHDX)

I’ve seen some people use the Generation 1 and Generation 2 labels with VHD and VHDX. They’re correct in the abstract sense, but the capitalization is wrong because these are not formal labels. I encourage you not to use these terms at all, because they easily become confused with Generation 1 and Generation 2 Hyper-V virtual machines. A Generation 1 virtual machine can use both VHD and VHDX as long as the management operating system can use both. A Generation 2 virtual machine can only utilize VHDX.

VHDX Block Sizes

The block size for a VHDX has nothing to do with anything that you know about block sizes for disks in any other context. For VHDX, block size is the increment by which a dynamically-expanding disk will grow when it needs to expand to hold more data. It can only be set at creation time, and only when you use PowerShell or native WMI commands to build the VHDX. For a fixed disk, it is always 0:

Block Size for Fixed VHDX

Block Size for Fixed VHDX

 

The default block size is 32 megabytes. This means that if a VHDX does not have enough space to satisfy the latest write request and has not yet reached the maximum configured size, the VHDX driver will allocate an additional 32 megabytes for the VHDX and will perform the write. If that is insufficient, it will continue allocating space in 32 megabyte blocks until the write is fully successful. While this article is not dedicated to dynamically expanding VHDXs, I want to point out that there is persistent FUD that these expansion events kill performance.

The people making that claim have absolutely no idea what they’re talking about. A heavily-written VHDX will either reach its maximum size very quickly and stop expanding or it will re-use blocks that it has already allocated. A VHDX simply cannot spend a significant portion of its lifetime in expansion, therefore it is impossible for expansion to cause a significant performance impact.

Expansion events will cause the new data blocks to be physically placed in the next available storage block on the management operating system’s file space. This, of course, will likely lead to the VHDX file becoming fragmented. Disk fragmentation in the modern datacenter is mostly a bogeyman used to terrify new administrators and sell unnecessary software or oversell hardware to uneducated decision makers, so expect to be confronted with a great deal of FUD about it. Just remember that disk access is always scattered when multiple virtual machines share a storage subsystem and that your first, best way to reduce storage performance bottlenecks is with more array spindles/SSDs.

For Linux systems, there is a soft recommendation to use a 1 megabyte block size. Space is allocated differently with the Linux file systems than NTFS/ReFS. With the 32 megabyte default, you will find much more slack space inside a Linux-containing VHDX than you will inside a Windows-containing VHDX. The recommendation to use the smaller block size can be considered soft because it doesn’t really hurt anything to leave things be — your space utilization just won’t be as efficient.

VHDX Sector Sizes

The term sector is somewhat outdated because it refers to a method of mapping physical drives that no one uses anymore. In more current terms, it signifies the minimum amount of space that can be used to store data on a disk. So, if you want to write a bit of data to the disk that is smaller than the sector size, it will pad the rest of the space with zeroes. The logical sector size is the smallest amount of data that the operating system will work with. The physical sector size is the smallest amount of data that the physical device will use. They don’t necessarily need to be the same size.

If the logical sector size is smaller than the physical sector size, read and write operations will be larger on the actual hard drive system. The operating system will discard the excess from read operations. On write operations, the physical drive will only make changes to the data indicated by the operating system although it may need to touch more data space than was called for. There are a great many discussions over “performance” and “best” and the like but they are all largely a waste of time. The VHDX driver is an intermediate layer responsible for translating all requests as best it can. The true performance points are all in the physical disk subsystem. To make your I/Os the fastest, use faster hardware. Do not waste time trying to tinker with VHDX sector sizes.

Remember that “portability” is a major component of virtualization. If you do spend a great deal of time ensuring that your VHDX files’ sector sizes are perfectly in-line with your physical subsystem, you may find that your next storage migration places your VHDXs in a sub-optimal configuration. The “best” thing to do is use the defaults for sector sizes and let the VHDX driver take care of it.

VHDX Files Do Not Permanently Belong to a Virtual Machine

When you create a virtual machine using any of the available GUI methods, you’re also given the opportunity to create a VHDX. Behind the scenes, the virtual machine creation and the virtual hard disk creation are two separate steps, with the act of attaching the VHDX to the virtual machine being its own third operation. That final operation will set the permissions on the VHDX file so that the virtual machine can access it, but it does not mean that the VHDX requires the virtual machine in order to operate. A virtual hard disk file:

  • Can be detached from one virtual machine and attached to another
  • Can be re-assigned from one controller and/or controller location to another within the same virtual machine
  • Can be mounted in a management operating system

I have an article explaining the process for VHDX attachment; it shows the various elements for controller selection as well as remove options so that you can perform all of the above. As for the VHDX files themselves, they can be transported via copy, xcopy, robocopy, Windows Explorer, and other file manipulation tools. If a virtual machine somehow loses the ability to open its own VHDX file(s), use this process to detach and reattach the VHDX to the virtual machine; that will fix any security problems.

Mount a VHDX File to Read in Windows (Even Your Desktop!)

As briefly mentioned in the opening portion of this article, you can mount a VHDX file in Windows 10 (as well as Windows 8 and 8.1 and Windows Server 2012+). This allows you to salvage data from a VHDX file if its original operating system had some sort of fatal problem. You can use it for more benign situations, such as injecting installation files into a base image. Just right-click on the file in Windows Explorer and click Mount:

Mount VHDX Using Windows Explorer

Mount VHDX Using Windows Explorer

 

All of the partitions will then be assigned drive letters in your management operating system and you can work with them as you would any other partition.

Note: When mounting VHDX files that contain boot partitions, you will sometimes get Access Denied messages because the file system drivers can’t read from those partitions. These messages do not impact the mount action.

When you’re done, just right-click on any of the partitions and click Eject. If there are no partitions visible in Windows Explorer, right-click the disk in Disk Management and click Detach VHD.

Mount a VHDX File Using PowerShell

You can use PowerShell to mount a VHDX only on systems that have Hyper-V installed. Just having the Hyper-V PowerShell module installed is not enough.

Mount a VHDX:

Dismount a VHDX:

If you know the disk number of the mounted VHDX, you can use the -DiskNumber parameter instead of the -Path parameter.

Copy a Physical Disk to a VHDX

There are many ways to get a physical disk into a VHDX. Be aware that just because you can convert a disk to VHDX does not mean that it can successfully boot inside a virtual machine! However, it will always be readable by any management operating system that can mount a VHDX and has the proper file system driver. These are the most common ways to convert a physical drive to VHDX:

Backup and Restore VHDX Files

The “best” way to do backup and restore of a virtual machine is to use a backup application specifically built for that purpose. But, the poor man’s way involves just copying the VHDX files wherever they are needed. For a “restore”, you’ll need to attach the virtual disks to the virtual machine that will own them.

Attach an Existing VHDX to an Existing Virtual Machine or Reset VHDX File Security

Occasionally, you’ll need to (re)connect an existing VHDX file to an existing virtual machine. Sometimes, you have to rebuild the virtual machine because its XML file was damaged. I sometimes do this because I have VHDX files that I use for templates that I can usually patch offline, but sometimes need to bring online.

While not directly related to the preceding, there are times when a virtual machine loses the ability to open a VHDX. This is invariably the rest of an administrator or security application removing necessary permissions from the VHDX file, often by erroneously setting inheritance from its containing folder.

Both problems have the same solution: use the attach directions. Hyper-V will automatically set the permissions as necessary when it connects the VHDX to the virtual machine.

Move a VHDX from One Bus to Another on the Same Virtual Machine

Generation 1 virtual machines have both a virtual IDE bus and a virtual SCSI bus. It’s rare, but sometimes you’ll want to move a VHDX from one to the other. The volume that contains the system bootstrapper must always be IDE 0 disk 0 (the first one), but any other disk can be moved.

You might need to do this because you accidentally placed a Windows page file on a virtual SCSI disk, which usually doesn’t work (and wouldn’t help with performance if it did, so stop that) or because you discovered the hard way that online resize operations don’t work with IDE-connected VHDX files. You can, of course, also move between different controllers and positions on the same bus type, if you have some need to do that.

Remember that you cannot modify anything on a VHDX connected to the virtual IDE bus while the virtual machine is online. The virtual SCSI bus allows for online modification.

Use the GUI to Move a VHDX to another Bus or Location

You must use Hyper-V Manager for non-clustered virtual machines and Failover Cluster Manager for clustered virtual machines. Or you could use some other application not covered here, like SCVMM.

In the relevant tool, open the settings dialog for the virtual machine and click the drive that you want to change. On the right, choose the new destination bus and location:

VHDX Bus Move

VHDX Bus Move

 

Notice that as you change the controller, the dialog contents will change automatically so that it already appears to be on the destination location before you even click OK or Apply. It will also show In Use for any location that already has an attached drive.

Use PowerShell to Move a VHDX to another Bus or Location

In PowerShell, use Set-VMHardDiskDrive to relocate a VHDX:

Warning: If running through a remote PowerShell session, be extremely mindful about second hop and delegation concerns. While it is not obvious, the above cmdlet first detaches the drive from its current location and then attaches it at the specified destination. Far fewer permissions are required to detach than attach. It is entirely possible that you will successfully detach the VHDX… and then the cmdlet will error, leaving the disk detached.