If you’ve had very much virtual machine churn in your environment, it’s almost inevitable that you’ve wound up with a few disconnected virtual machine files here and there. This free script will help you to locate orphaned Hyper-V VM files. For fellow infrastructure scripters, there is a special bonus script included as well.

The worst thing about my testing environment is that I often wind up with files that look legitimate, but are no longer part of any virtual machine. Even in live environments, some virtual machine moves and operations leave a trail of unwanted files. Errors and failures can produce more.

Finding virtual machine files isn’t hard, but ensuring that they’re not really attached to something can be much more difficult than it might at first seem. I tried very hard to include a great many features to cover every scenario I could think of.

Last Updated: April 11th, 2015

Feature List

  • Aware of snapshots/checkpoints
  • Aware of differencing disk chains
  • Cluster-aware
    • Will only scan Cluster Shared Volumes from a single node, and will not generate false positives while scanning other nodes
    • Will differentiate between shared and not shared storage, so you can scan local drives on nodes
    • Will detect all nodes in a cluster and scan each
    • Can be set to ignore cluster membership if you only want to scan non-shared locations
    • I didn’t do extensive testing with standard cluster disks as they’re not something I use, but it seems to work well enough. However, if a cluster disk were to move between the portion of the script where it scans for files to exclude and when it scans again for orphans, the results will be invalid.
    • The script understands raw volume identifiers, although I didn’t do a great deal of testing with those either.
  • Can be run from any system with PowerShell 4.0; the Hyper-V module is not required. I didn’t test it with PowerShell 3.0 systems because I haven’t got any left, but I think it will work
  • Can target 2012 and 2012 R2 systems simultaneously
  • Options to scan one or all of the following locations: the host’s default VM and VHD paths, all paths currently in use by VMs, and paths that you specify
  • All scans occur directly on the target host(s) except in the case of UNC paths, and each separate host is scanned in parallel
  • Detailed built-in help. Use Get-Help -Full to see it

Files that the Script Scans For

  • XML files with a base name in the format of a GUID. It does not parse the XML to see if they really are Hyper-V files, so watch out for false positives.
  • BIN files with a base name in the format of a GUID.
  • VSV files with a base name in the format of a GUID
  • SLP files with a base name in the format of a GUID and an intermediate extension
  • VHD, VHDX, AVHD, AVHDX, and VFD files; if you look at the script, it will match on anything that has VHD or VFD anywhere in the extension; please report false positives as I can update the script to be more selective if necessary.

Notes

  • All target hosts must have the Hyper-V PowerShell module installed. Any that don’t have it will throw errors that they cannot be scanned. The computer that you’re using to run the script does not need anything other than the default PowerShell modules installed. However, scanning SMB shares will be considerably faster if the Hyper-V module is installed locally.
  • The script will only exclude differencing disk chains from the point that a VM references it and upward to the root. For example, let’s say you have a “root.vhdx” with children “diff1.vhdx”, “diff2.vhdx”, and “diff3.vhdx”, with each one being a child of the previous disk. Let’s say you have a virtual machine that references “diff2.vhdx”. The scan will find the VM using “diff2.vhdx” and trace its chain upward through “diff1.vhdx” and “root.vhdx” and ignore them. If it doesn’t find any VM using “diff3.vhdx”, that file will be marked as orphaned. However, the script is snapshot/checkpoint aware so even complicated trees should not trigger a false positive.
  • The script uses a mix of implicit and explicit PowerShell Remoting with a very heavy dependence upon explicit. Even if you initiate directly from a system to be scanned, it will use PSRemoting. The drawback is that you can’t initiate this script inside a remote session unless you have CredSSP enabled and even then it might fail. To compensate, I ensured that you don’t need to have any special modules installed on the system that you run the script from.
  • You can’t specify connection credentials for scanning SMB 3 storage. I tried. It didn’t work. I didn’t get any errors to troubleshoot. It just didn’t work. If you use the -Credential parameter, the credentials will be used for retrieving host details, determining existing VM information and paths, and scanning storage local to the target hosts, but SMB 3 shares will be scanned using the credentials of the local session being used to invoke this function.
  • This script uses fully-qualified domain names to connect to domain members and you can’t get around it. The purpose of this is to ensure that cluster scans work correctly. For instance, if you pass in the short name of a cluster (“clhv1” in my case), it will use WSMan to attach to that address, query the system for its FQDN using WMI, and use that for all future connections. While it’s there, it will also determine the names of any other systems in the cluster and retrieve those as well. The side effect is that if you supply a DNS alias, it will be replaced with the true computer’s name for all subsequent connections. If you’re having trouble connecting to a system, this is probably why. Use the -Verbose switch to find out what system name it’s trying to connect to. If the target isn’t domain-joined, then the script should connect using only its NetBIOS name.
  • I had to install a lot of wiring to avoid returning false positives on Cluster Shared Volumes. The solution I settled on is that all scans of the ClusterStorage path will be handled only through the primary node in all cases. You won’t be able to bypass this behavior by any means, even by manually specifying a path of C: or C:ClusterStorage. I don’t think this will be a problem, but the explanation is here just in case you see behavior that you didn’t expect, such as by setting -IgnoreClusterMembership and scanning the C: drive of a node other than the primary.
  • The way I use PowerShell Remoting in parallel might cause some scans to take longer than you might think they would when you are only scanning the local system. That’s because of the behavior of the Wait-Job cmdlet. Since disk scans tend to be slow anyway, I don’t think that this adds enough time to the scan to be harmful, but if you think that something is taking a few seconds longer than it should, this is probably why.
  • The script will scan UNC paths, but might have some unexpected behavior in a few cases. For instance, let’s say you have a Windows Server system serving an SMB 3 share to hold VMs for other hosts. You also decide to stand up a local VM on that host and put it in the same location. From that host’s perspective, the VM might be stored in, let’s say D:VMs. D:VMs is also shared as storage-serverVMs. If the scan is set to look at that share location, then it’s going to return all of that virtual machine’s files as orphans because it only knows them as being registered in D:VMs on that host.

There is no built-in option to remove the files that it finds. This would be a fairly trivial modification, but I’m not going to make it. I worked very hard to do almost everything I could to eliminate false positives, but they can and will happen and I won’t be responsible for someone accidentally wiping out all their templates or a completely unrelated file that happened to be named in the pattern of a Hyper-V file.

Script Usage

There are two parameter sets. The first is the default. It first finds all the virtual machines on a host for their files. It then scans all the folders that hold those files and the folders marked as the host’s defaults for orphans.

Default Parameter Set

  • ComputerName accepts a string array of computer names. If you don’t include any, the local system is scanned.
  • ExcludeDefaultPath prevents the host’s default paths from being scanned for orphaned files.
  • ExcludeExistingVMPaths prevents the paths of existing VMs from being scanned for orphaned files.
  • IgnoreClusterMembership treats the target system like a standalone machine even if it’s in a cluster. CSV folders will be ignored.
  • Credential allows you to specify a set of credentials that will be used to scan remote locations. Any SMB 3 shares will always be scanned from the perspective of the local system using the credentials of the local session.

Alternate Parameter Set

  • ComputerName in this set is the same as the default.
  • Path accepts a string array of paths to check. Each host in the ComputerName array will be scanned for this folder. If it’s not there, you’ll just get a warning about it, so there’s no harm in including it even for systems where you know it doesn’t exist. When you manually specify paths, only those locations are scanned by default.
  • IncludeDefaultPath instructs the script to scan the host’s default paths for orphans when -Path has been specified.
  • IncludeExistingVMPaths instructs the script to scan the folders of existing VMs for orphans when -Path has been specified.
  • IgnoreClusterMembership works in this set the same way as in the default.
  • Credential in this set is same as the default.

Sample Output

Get-OrphanedVMFiles ScreenshostThe output is essentially that of Get-ChildItem. Take a look at the PSComputerName column. That will tell you which computer the file was found on. If it reports “localhost”, that means the containing folder is on a share. All others will be local or on a CSV on the indicated computer.

Script Listing

Copy and paste the included script into a file. It’s written to expect Get-OrphanedVMFiles.ps1, but you’re welcome to use anything you like. Don’t forget that copying and pasting from this site includes a tag line at the very end that you’ll need to remove.

I did not write the script to be dot-sourced. If you want to do that, encase the contents as shown:

And here’s the script:

Bonus Script

The important parts of this script are contained in the body of the main script above, but I thought its contents might be useful to anyone trying to use PowerShell or .Net to work with the metadata or contents of VHD and/or VHDX files. I needed it in the above script because I promised that you could run it from any remote system, even one without the Hyper-V PowerShell module installed. Because of that, I needed some way to check VHD files to see if they are differencing disks, and if so, to retrieve their parent disk files to exclude them as orphans. To do that with a Microsoft-developed PowerShell script, you need Get-VHD. Get-VHD doesn’t only need the PowerShell module installed, it also requires the Hyper-V role to be enabled. I didn’t think it was fair to expect you to do all that just to see if a VHD on a share is a differencing disk or not, so I built a function that only needs the default PowerShell modules.

I only needed to know enough about the file to see what its parent VHD(X)s are, but there is more than enough example here to help someone go so far as to build a complete VHD(X) parser, if desired. I/O operations have always been one of my weakest programming skills, so I pretty much abandoned any notions of designing an efficient script and went for one that more or less maximizes readability. It works more than quickly enough for my purposes, but I would highly recommend that if you intend to build this into something intended for a heavier role that you spend some time optimizing what I’m leaving you.

This script illustrates how to safely open a VHD(X) file for reading, even if it’s in use by Hyper-V. It shows how to step through the components of the file to look for the information that you need. It also includes a couple of methods for deciphering the bizarre information encoding patterns of VHD files. Look in the .NOTES section of the comment-based help for links to download the specifications for both VHD and VHDX files which can help you to design a script for whatever you’re looking for.

Note: Even though this script is always safe to run, I’ve gotten semaphore timeouts while reading very large files at the other end of an SMB share.

Usage

  • Path is the script’s only input. This is a string field that will take the name of the disk file to scan. I didn’t test it, but based on my understanding of PowerShell, piping a string array to it that contains multiple file names should cause the script to run against each of them.

If the targeted file name has a parent, its complete path name will be returned as a string. If it doesn’t, the script will return nothing. The entire script is encapsulated in a single try block whose main purpose is to ensure that the targeted file is closed under any circumstances. Errors from earlier sections will be rethrown. If you call this script from one of your own, consider encapsulating it in a try block with -ErrorAction Stop.

Because this script was intended for use with the above script that does its own extensive validation, I didn’t bother with any validation of -Path. The script will fail predictably even without such checks so it’s not strictly necessary.