How to Perform Hyper-V Storage Migration

How to Perform Hyper-V Storage Migration

New servers? New SAN? Trying out hyper-convergence? Upgrading to Hyper-V 2016? Any number of conditions might prompt you to move your Hyper-V virtual machine’s storage to another location. Let’s look at the technologies that enable such moves.

An Overview of Hyper-V Migration Options

Hyper-V offers numerous migration options. Each has its own distinctive features. Unfortunately, we in the community often muck things up by using incorrect and confusing terminology. So, let’s briefly walk through the migration types that Hyper-V offers:

  • Quick migration: Cluster-based virtual machine migration that involves placing a virtual machine into a saved state, transferring ownership to another node in the same cluster, and resuming the virtual machine. A quick migration does not involve moving anything that most of us consider storage.
  • Live migration: Cluster-based virtual machine migration that involves transferring the active state of a running virtual machine to another node in the same cluster. A Live Migration does not involve moving anything that most of us consider storage.
  • Storage migration: Any technique that utilizes the Hyper-V management service to relocate any file-based component that belongs to a virtual machine. This article focuses on this migration type, so I won’t expand any of those thoughts in this list.
  • Shared Nothing Live Migration: Hyper-V migration technique between two hosts that does not involve clustering. It may or may not include a storage migration. The virtual machine might or might not be running. However, this migration type always includes ownership transfer from one host to another.

It Isn’t Called Storage Live Migration

I have always called this operation “Storage Live Migration”. I know lots of other authors call it “Storage Live Migration”. But, Microsoft does not call it “Storage Live Migration”. They just call it “Storage Migration”. The closest thing that I can find to “Storage Live Migration” in anything from Microsoft is a 2012 TechEd recording by Benjamin Armstrong. The title of that presentation includes the phrase “Live Storage Migration”, but I can’t determine if the “Live” just modifies “Storage Migration” or if Ben uses it as part of the technology name. I suppose I could listen to the entire hour and a half presentation, but I’m lazy. I’m sure that it’s a great presentation, if anyone wants to listen and report back.

Anyway, does it matter? I don’t really think so. I’m certainly not going to correct anyone that uses that phrase. However, the virtual machine does not necessarily need to be live. We use the same tools and commands to move a virtual machine’s storage whether it’s online or offline. So, “Storage Migration” will always be a correct term. “Storage Live Migration”, not so much. However, we use the term “Shared Nothing Live Migration” for virtual machines that are turned off, so we can’t claim any consistency.

What Can Be Moved with Hyper-V Storage Migration?

When we talk about virtual machine storage, most people think of the places where the guest operating system stores its data. That certainly comprises the physical bulk of virtual machine storage. However, it’s also only one bullet point on a list of multiple components that form a virtual machine.

Independently, you can move any of these virtual machine items:

  • The virtual machine’s core files (configuration in xml or .vmcx, .bin, .vsv, etc.)
  • The virtual machine’s checkpoints (essentially the same items as the preceding bullet point, but for the checkpoint(s) instead of the active virtual machine)
  • The virtual machine’s second-level paging file location. I have not tested to see if it will move a VM with active second-level paging files, but I have no reason to believe that it wouldn’t
  • Virtual hard disks attached to a virtual machine
  • ISO images attached to a virtual machine

We most commonly move all of these things together. Hyper-V doesn’t require that, though. Also, we can move all of these things in the same operation but distribute them to different destinations.

What Can’t Be Moved with Hyper-V Storage Migration?

In terms of storage, we can move everything related to a virtual machine. But, we can’t move the VM’s active, running state with Storage Migration. Storage Migration is commonly partnered with a Live Migration in the operation that we call “Shared Nothing Live Migration”. To avoid getting bogged down in implementation details that are more academic than practical, just understand one thing: when you pick the option to move the virtual machine’s storage, you are not changing which Hyper-V host owns and runs the virtual machine.

More importantly, you can’t use any Microsoft tool-based technique to separate a differencing disk from its parent. So, if you have an AVHDX (differencing disk created by the checkpointing mechanism) and you want to move it away from its source VHDX, Storage Migration will not do it. If you instruct Storage Migration to move the AVHDX, the entire disk chain goes along for the ride.

Uses for Hyper-V Storage Migration

Out of all the migration types, storage migration has the most applications and special conditions. For instance, Storage Migration is the only Hyper-V migration type that does not always require domain membership. Granted, the one exception to the domain membership rule won’t be very satisfying for people that insist on leaving their Hyper-V hosts in insecure workgroup mode, but I’m not here to please those people. I’m here to talk about the nuances of Storage Migration.

Local Relocation

Let’s start with the simplest usage: relocation of local VM storage. Some situations in this category:

  • You left VMs in the default “C:\ProgramData\Microsoft\Windows\Hyper-V” and/or “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks” locations and you don’t like it
  • You added new internal storage as a separate volume and want to re-distribute your VMs
  • You have storage speed tiers but no active management layer
  • You don’t like the way your VMs’ files are laid out
  • You want to defragment VM storage space. It’s a waste of time, but it works.

Network Relocation

With so many ways to do network storage, it’s nearly a given that we’ll all need to move a VHDX across ours at some point. Some situations:

  • You’re migrating from local storage to network storage
  • You’re replacing a SAN or NAS and need to relocate your VMs
  • You’ve expanded your network storage and want to redistribute your VMs

Most of the reasons listed under “Local Relocation” can also apply to network relocation.

Cluster Relocation

We can’t always build our clusters perfectly from the beginning. For the most part, a cluster’s relocation needs list will look like the local and network lists above. A few others:

  • Your cluster has new Cluster Shared Volumes that you want to expand into
  • Existing Cluster Shared Volumes do not have a data distribution that does not balance well. Remember that data access from a CSV owner node is slightly faster than from a non-owner node

The reasons matter less than the tools when you’re talking about clusters. You can’t use the same tools and techniques to move virtual machines that are protected by Failover Clustering under Hyper-V as you use for non-clustered VMs.

Turning the VM Off Makes a Difference for Storage Migration

You can perform a very simple experiment: perform a Storage Migration for a virtual machine while it’s on, then turn it off and migrate it back. The virtual machine will move much more quickly while it’s off. This behavior can be explained in one word: synchronization.

When the virtual machine is off, a Storage Migration is essentially a monitored file copy. The ability of the constituent parts to move bits from source to destination sets the pace of the move. When the virtual machine is on, all of the rules change. The migration is subjected to these constraints:

  • The virtual machine’s operating system must remain responsive
  • Writes must be properly captured
  • Reads must occur from the most appropriate source

Even if the guest operating does not experience much activity during the move, that condition cannot be taken as a constant. In other words, Hyper-V needs to be ready for it to start demanding lots of I/O at any time.

So, the Storage Migration of a running virtual machine will always take longer than the Storage Migration of a virtual machine in an off or saved state. You can choose the convenience of an online migration or the speed of an offline migration.

Note: You can usually change a virtual machine’s power state during a Storage Migration. It’s less likely to work if you are moving across hosts.

How to Perform Hyper-V Storage Migration with PowerShell

The nice thing about using PowerShell for Storage Migration: it works for all Storage Migration types. The bad thing about using PowerShell for Storage Migration: it can be difficult to get all of the pieces right.

The primary cmdlet to use is Move-VMStorage. If you will be performing a Shared Nothing Live Migration, you can also use Move-VM. The parts of Move-VM that pertain to storage match Move-VMStorage. Move-VM has uses, requirements, and limitations that don’t pertain to the topic of this article, so I won’t cover Move-VM here.

A Basic Storage Migration in PowerShell

Let’s start with an easy one. Use this when you just want all of a VM’s files to be in one place:

This will move the virtual machine named testvm so that all of its components reside under the C:\LocalVMs folder. That means:

  • The configuration files will be placed in C:\LocalVMs\Virtual Machines
  • The checkpoint files will be placed in C:\LocalVMs\Snapshots
  • The VHDXs will be placed in C:\LocalVMs\Virtual Hard Disks
  • Depending on your version, an UndoLog Configuration folder will be created if it doesn’t already exist. The folder is meant to contain Hyper-V Replica files. It may be created even for virtual machines that aren’t being replicated.

Complex Storage Migrations in PowerShell

For more complicated move scenarios, you won’t use the DestinationStoragePath parameter. You’ll use one or more of the individual component parameters. Choose from the following:

  • VirtualMachinePath: Where to place the VM’s configuration files.
  • SnapshotFilePath: Where to place the VM’s checkpoint files (again, NOT the AVHDXs!)
  • SmartPagingFilePath: Where to place the VM’s smart paging files
  • Vhds: An array of hash tables that indicate where to place individual VHD/X files.

Some notes on these items:

  • You are not required to use all of these parameters. If you do not specify a parameter, then its related component is left alone. Meaning, it doesn’t get moved at all.
  • If you’re trying to use this to get away from those auto-created Virtual Machines and Snapshots folders, it doesn’t work. They’ll always be created as sub-folders of whatever you type in.
  • It doesn’t auto-create a Virtual Hard Disks folder.
  • If you were curious whether or not you needed to specify those auto-created subfolders, the answer is: no. Move-VMStorage will always create them for you (unless they already exist).
  • The VHDs hash table is the hardest part of this whole thing. I’m usually a PowerShell-first kind of guy, but even I tend to go to the GUI for Storage Migrations.

The following will move all components except VHDs, which I’ll tackle in the next section:

Move-VMStorage’s Array of Hash Tables for VHDs

The three …FilePath parameters are easy: just specify the path. The Vhds parameter is tougher. It is one or more hash tables inside an array.

First, the hash tables. A hash table is a custom object that looks like an array, but each entry has a unique name. The hash tables that Vhds expects have a SourceFilePath entry and a DestinationFilePath entry. Each must be fully-qualified for a file. A hash table is contained like this: @{ }. The name of an entry and its value are joined with an =. Entries are separated by a ; So, if you want to move the VHDX named svtest.vhdx from \\svstore\VMs to C:\LocalVMs\testvm, you’d use this hash table:

Reading that, you might ask (quite logically): “Can I change the name of the VHDX file when I move it?” The answer: No, you cannot. So, why then do you need to enter the full name of the destination file? I don’t know!

Next, the arrays. An array is bounded by @( ). Its entries are separated by commas. So, to move two VHDXs, you would do something like this:

I broke that onto multiple lines for legibility. You can enter it all on one line. Note where I used parenthesis and where I used curly braces.

Tip: To move a single VHDX file, you don’t need to do the entire array notation. You can use the first example with Vhds.

A Practical Move-VMStorage Example with Vhds

If you’re looking at all that and wondering why you’d ever use PowerShell for such a thing, I have the perfect answer: scripting. Don’t do this by hand. Use it to move lots of VMs in one fell swoop. If you want to see a plain example of the Vhds parameter in action, the Get-Help examples show one. I’ve got a more practical script in mind.

The following would move all VMs on the host. All of their config, checkpoint, and second-level paging files will be placed on a share named “\\vmstore\slowstorage”. All of their VHDXs will be placed on a share named “\\vmstore\faststorage”. We will have PowerShell deal with the source paths and file names.

I used splatting for the parameters for two reasons: 1, legibility. 2, to handle VMs without any virtual hard disks.

How to Perform Hyper-V Storage Migration with Hyper-V Manager

Hyper-V Manager can only be used for non-clustered virtual machines. It utilizes a wizard format. To use it to move a virtual machine’s storage:

  1. Right-click on the virtual machine and click Move.
  2. Click Next on the introductory page.
  3. Change the selection to Move the virtual machine’s storage (the same storage options would be available if you moved the VM’s ownership, but that’s not part of this article)
    movevm_hvmwiz1
  4. Choose how to perform the move. You can move everything to the same location, you can move everything to different locations, or you can move only the virtual hard disks.
    movevm_hvmwiz2
  5. What screens you see next will depend on what you chose. We’ll cover each branch.

If you opt to move everything to one location, the wizard will show you this simple page:

movevm_hvmwiz3

If you choose the option to Move the virtual machine’s data to different locations, you will first see this screen:

movevm_hvmwiz4

For every item that you check, you will be given a separate screen where you indicate the desired location for that item. The wizard uses the same screen for these items as it does for the hard-disks only option. I’ll show its screen shot next.

If you choose Move only the virtual machine’s virtual hard disks, then you will be given a sequence of screens where you instruct it where to move the files. These are the same screens used for the individual components from the previous selection:

movevm_hvmwiz5

After you make your selections, you’ll be shown a summary screen where you can click Finish to perform the move:

movevm_hvmwiz6

How to Perform Hyper-V Storage Migration with Failover Cluster Manager

Failover Cluster Manager uses a slick single-screen interface to move storage for cluster virtual machines. To access it, simply right-click a virtual machine, hover over Move, and click Virtual Machine Storage. You’ll see the following screen:

movecm_fcm1

If you just want to move the whole thing to one of the display Cluster Shared Volumes, just drag and drop it down to that CSV in the Cluster Storage heading at the lower left. You can drag and drop individual items or the entire VM. The Destination Folder Path will be populated accordingly.

As you can see in mine, I have all of the components except the VHD on an SMB share. I want to move the VHD to be with the rest. To get a share to show up, click the Add Share button. You’ll get this dialog:

movevm_fcmaddshare

The share will populate underneath the CSVs in the lower left. Now, I can drag and drop that file to the share. View the differences:

movecm_fcm2

Once you have the dialog the way that you like it, click Start.

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

Few Hyper-V topics burn up the Internet quite like “performance”. No matter how fast it goes, we always want it to go faster. If you search even a little, you’ll find many articles with long lists of ways to improve Hyper-V’s performance. The less focused articles start with general Windows performance tips and sprinkle some Hyper-V-flavored spice on them. I want to use this article to tighten the focus down on Hyper-V hardware settings only. That means it won’t be as long as some others; I’ll just think of that as wasting less of your time.

1. Upgrade your system

I guess this goes without saying but every performance article I write will always include this point front-and-center. Each piece of hardware has its own maximum speed. Where that speed barrier lies in comparison to other hardware in the same category almost always correlates directly with cost. You cannot tweak a go-cart to outrun a Corvette without spending at least as much money as just buying a Corvette — and that’s without considering the time element. If you bought slow hardware, then you will have a slow Hyper-V environment.

Fortunately, this point has a corollary: don’t panic. Production systems, especially server-class systems, almost never experience demand levels that compare to the stress tests that admins put on new equipment. If typical load levels were that high, it’s doubtful that virtualization would have caught on so quickly. We use virtualization for so many reasons nowadays, we forget that “cost savings through better utilization of under-loaded server equipment” was one of the primary drivers of early virtualization adoption.

2. BIOS Settings for Hyper-V Performance

Don’t neglect your BIOS! It contains some of the most important settings for Hyper-V.

  • C States. Disable C States! Few things impact Hyper-V performance quite as strongly as C States! Names and locations will vary, so look in areas related to Processor/CPU, Performance, and Power Management. If you can’t find anything that specifically says C States, then look for settings that disable/minimize power management. C1E is usually the worst offender for Live Migration problems, although other modes can cause issues.
  • Virtualization support: A number of features have popped up through the years, but most BIOS manufacturers have since consolidated them all into a global “Virtualization Support” switch, or something similar. I don’t believe that current versions of Hyper-V will even run if these settings aren’t enabled. Here are some individual component names, for those special BIOSs that break them out:
    • Virtual Machine Extensions (VMX)
    • AMD-V — AMD CPUs/mainboards. Be aware that Hyper-V can’t (yet?) run nested virtual machines on AMD chips
    • VT-x, or sometimes just VT — Intel CPUs/mainboards. Required for nested virtualization with Hyper-V in Windows 10/Server 2016
  • Data Execution Prevention: DEP means less for performance and more for security. It’s also a requirement. But, we’re talking about your BIOS settings and you’re in your BIOS, so we’ll talk about it. Just make sure that it’s on. If you don’t see it under the DEP name, look for:
    • No Execute (NX) — AMD CPUs/mainboards
    • Execute Disable (XD) — Intel CPUs/mainboards
  • Second Level Address Translation: I’m including this for completion. It’s been many years since any system was built new without SLAT support. If you have one, following every point in this post to the letter still won’t make that system fast. Starting with Windows 8 and Server 2016, you cannot use Hyper-V without SLAT support. Names that you will see SLAT under:
    • Nested Page Tables (NPT)/Rapid Virtualization Indexing (RVI) — AMD CPUs/mainboards
    • Extended Page Tables (EPT) — Intel CPUs/mainboards
  • Disable power management. This goes hand-in-hand with C States. Just turn off power management altogether. Get your energy savings via consolidation. You can also buy lower wattage systems.
  • Use Hyperthreading. I’ve seen a tiny handful of claims that Hyperthreading causes problems on Hyper-V. I’ve heard more convincing stories about space aliens. I’ve personally seen the same number of space aliens as I’ve seen Hyperthreading problems with Hyper-V (that would be zero). If you’ve legitimately encountered a problem that was fixed by disabling Hyperthreading AND you can prove that it wasn’t a bad CPU, that’s great! Please let me know. But remember, you’re still in a minority of a minority of a minority. The rest of us will run Hyperthreading.
  • Disable SCSI BIOSs. Unless you are booting your host from a SAN, kill the BIOSs on your SCSI adapters. It doesn’t do anything good or bad for a running Hyper-V host but slows down physical boot times.
  • Disable BIOS-set VLAN IDs on physical NICs. Some network adapters support VLAN tagging through boot-up interfaces. If you then bind a Hyper-V virtual switch to one of those adapters, you could encounter all sorts of network nastiness.

3. Storage Settings for Hyper-V Performance

I wish the IT world would learn to cope with the fact that rotating hard disks do not move data very quickly. If you just can’t cope with that, buy a gigantic lot of them and make big RAID 10 arrays. Or, you could get a stack of SSDs. Don’t get six or so spinning disks and get sad that they “only” move data at a few hundred megabytes per second. That’s how the tech works.

Performance tips for storage:

  • Learn to live with the fact that storage is slow.
  • Remember that speed tests do not reflect real world load and that file copy does not test anything except permissions.
  • Learn to live with Hyper-V’s I/O scheduler. If you want a computer system to have 100% access to storage bandwidth, start by checking your assumptions. Just because a single file copy doesn’t go as fast as you think it should, does not mean that the system won’t perform its production role adequately. If you’re certain that a system must have total and complete storage speed, then do not virtualize it. The only way that a VM can get that level of speed is by stealing I/O from other guests.
  • Enable read caches
  • Carefully consider the potential risks of write caching. If acceptable, enable write caches. If your internal disks, DAS, SAN, or NAS has a battery backup system that can guarantee clean cache flushes on a power outage, write caching is generally safe. Internal batteries that report their status and/or automatically disable caching are best. UPS-backed systems are sometimes OK, but they are not foolproof.
  • Prefer few arrays with many disks over many arrays with few disks.
  • Unless you’re going to store VMs on a remote system, do not create an array just for Hyper-V. By that, I mean that if you’ve got six internal bays, do not create a RAID-1 for Hyper-V and a RAID-x for the virtual machines. That’s a Microsoft SQL Server 2000 design. This is 2017 and you’re building a Hyper-V server. Use all the bays in one big array.
  • Do not architect your storage to make the hypervisor/management operating system go fast. I can’t believe how many times I read on forums that Hyper-V needs lots of disk speed. After boot-up, it needs almost nothing. The hypervisor remains resident in memory. Unless you’re doing something questionable in the management OS, it won’t even page to disk very often. Architect storage speed in favor of your virtual machines.
  • Set your fibre channel SANs to use very tight WWN masks. Live Migration requires a hand off from one system to another, and the looser the mask, the longer that takes. With 2016 the guests shouldn’t crash, but the hand-off might be noticeable.
  • Keep iSCSI/SMB networks clear of other traffic. I see a lot of recommendations to put each and every iSCSI NIC on a system into its own VLAN and/or layer-3 network. I’m on the fence about that; a network storm in one iSCSI network would probably justify it. However, keeping those networks quiet would go a long way on its own. For clustered systems, multi-channel SMB needs each adapter to be on a unique layer 3 network (according to the docs; from what I can tell, it works even with same-net configurations).
  • If using gigabit, try to physically separate iSCSI/SMB from your virtual switch. Meaning, don’t make that traffic endure the overhead of virtual switch processing, if you can help it.
  • Round robin MPIO might not be the best, although it’s the most recommended. If you have one of the aforementioned network storms, Round Robin will negate some of the benefits of VLAN/layer 3 segregation. I like least queue depth, myself.
  • MPIO and SMB multi-channel are much faster and more efficient than the best teaming.
  • If you must run MPIO or SMB traffic across a team, create multiple virtual or logical NICs. It will give the teaming implementation more opportunities to create balanced streams.
  • Use jumbo frames for iSCSI/SMB connections if everything supports it (host adapters, switches, and back-end storage). You’ll improve the header-to-payload bit ratio by a meaningful amount.
  • Enable RSS on SMB-carrying adapters. If you have RDMA-capable adapters, absolutely enable that.
  • Use dynamically-expanding VHDX, but not dynamically-expanding VHD. I still see people recommending fixed VHDX for operating system VHDXs, which is just absurd. Fixed VHDX is good for high-volume databases, but mostly because they’ll probably expand to use all the space anyway. Dynamic VHDX enjoys higher average write speeds because it completely ignores zero writes. No defined pattern has yet emerged that declares a winner on read rates, but people who say that fixed always wins are making demonstrably false assumptions.
  • Do not use pass-through disks. The performance is sometimes a little bit better, but sometimes it’s worse, and it almost always causes some other problem elsewhere. The trade-off is not worth it. Just add one spindle to your array to make up for any perceived speed deficiencies. If you insist on using pass-through for performance reasons, then I want to see the performance traces of production traffic that prove it.
  • Don’t let fragmentation keep you up at night. Fragmentation is a problem for single-spindle desktops/laptops, “admins” that never should have been promoted above first-line help desk, and salespeople selling defragmentation software. If you’re here to disagree, you better have a URL to performance traces that I can independently verify before you even bother entering a comment. I have plenty of Hyper-V systems of my own on storage ranging from 3-spindle up to >100 spindle, and the first time I even feel compelled to run a defrag (much less get anything out of it) I’ll be happy to issue a mea culpa. For those keeping track, we’re at 6 years and counting.

4. Memory Settings for Hyper-V Performance

There isn’t much that you can do for memory. Buy what you can afford and, for the most part, don’t worry about it.

  • Buy and install your memory chips optimally. Multi-channel memory is somewhat faster than single-channel. Your hardware manufacturer will be able to help you with that.
  • Don’t over-allocate memory to guests. Just because your file server had 16GB before you virtualized it does not mean that it has any use for 16GB.
  • Use Dynamic Memory unless you have a system that expressly forbids it. It’s better to stretch your memory dollar farther than wring your hands about whether or not Dynamic Memory is a good thing. Until directly proven otherwise for a given server, it’s a good thing.
  • Don’t worry so much about NUMA. I’ve read volumes and volumes on it. Even spent a lot of time configuring it on a high-load system. Wrote some about it. Never got any of that time back. I’ve had some interesting conversations with people that really did need to tune NUMA. They constitute… oh, I’d say about .1% of all the conversations that I’ve ever had about Hyper-V. The rest of you should leave NUMA enabled at defaults and walk away.

5. Network Settings for Hyper-V Performance

Networking configuration can make a real difference to Hyper-V performance.

  • Learn to live with the fact that gigabit networking is “slow” and that 10GbE networking often has barriers to reaching 10Gbps for a single test. Most networking demands don’t even bog down gigabit. It’s just not that big of a deal for most people.
  • Learn to live with the fact that a) your four-spindle disk array can’t fill up even one 10GbE pipe, much less the pair that you assigned to iSCSI and that b) it’s not Hyper-V’s fault. I know this doesn’t apply to everyone, but wow, do I see lots of complaints about how Hyper-V can’t magically pull or push bits across a network faster than a disk subsystem can read and/or write them.
  • Disable VMQ on gigabit adapters. I think some manufacturers are finally coming around to the fact that they have a problem. Too late, though. The purpose of VMQ is to redistribute inbound network processing for individual virtual NICs away from CPU 0, core 0 to the other cores in the system. Current-model CPUs are fast enough that they can handle many gigabit adapters.
  • If you are using a Hyper-V virtual switch on a network team and you’ve disabled VMQ on the physical NICs, disable it on the team adapter as well. I’ve been saying that since shortly after 2012 came out and people are finally discovering that I’m right, so, yay? Anyway, do it.
  • Don’t worry so much about vRSS. RSS is like VMQ, only for non-VM traffic. vRSS, then, is the projection of VMQ down into the virtual machine. Basically, with traditional VMQ, the VMs’ inbound traffic is separated across pNICs in the management OS, but then each guest still processes its own data on vCPU 0. vRSS splits traffic processing across vCPUs inside the guest once it gets there. The “drawback” is that distributing processing and then redistributing processing causes more processing. So, the load is nicely distributed, but it’s also higher than it would otherwise be. The upshot: almost no one will care. Set it or don’t set it, it’s probably not going to impact you a lot either way. If you’re new to all of this, then you’ll find an “RSS” setting on the network adapter inside the guest. If that’s on in the guest (off by default) and VMQ is on and functioning in the host, then you have vRSS. woohoo.
  • Don’t blame Hyper-V for your networking ills. I mention this in the context of performance because your time has value. I’m constantly called upon to troubleshoot Hyper-V “networking problems” because someone is sharing MACs or IPs or trying to get traffic from the dark side of the moon over a Cat-3 cable with three broken strands. Hyper-V is also almost always blamed by people that just don’t have a functional understanding of TCP/IP. More wasted time that I’ll never get back.
  • Use one virtual switch. Multiple virtual switches cause processing overhead without providing returns. This is a guideline, not a rule, but you need to be prepared to provide an unflinching, sure-footed defense for every virtual switch in a host after the first.
  • Don’t mix gigabit with 10 gigabit in a team. Teaming will not automatically select 10GbE over the gigabit. 10GbE is so much faster than gigabit that it’s best to just kill gigabit and converge on the 10GbE.
  • 10x gigabit cards do not equal 1x 10GbE card. I’m all for only using 10GbE when you can justify it with usage statistics, but gigabit just cannot compete.

6. Maintenance Best Practices

Don’t neglect your systems once they’re deployed!

  • Take a performance baseline when you first deploy a system and save it.
  • Take and save another performance baseline when your system reaches a normative load level (basically, once you’ve reached its expected number of VMs).
  • Keep drivers reasonably up-to-date. Verify that settings aren’t lost after each update.
  • Monitor hardware health. The Windows Event Log often provides early warning symptoms, if you have nothing else.

 

Further reading

If you carry out all (or as many as possible) of the above hardware adjustments you will witness a considerable jump in your hyper-v performance. That I can guarantee. However, for those who don’t have the time, patience or prepared to make the necessary investment in some cases, Altaro has developed an e-book just for you. Find out more about it here: Supercharging Hyper-V Performance for the time-strapped admin.

How to Use Hyper-V and Kali Linux to Securely Wipe a Hard Drive

How to Use Hyper-V and Kali Linux to Securely Wipe a Hard Drive

The exciting time has come for my wife’s laptop to be replaced. After all the fun parts, we’ve still got this old laptop on our hands, though. Normally, we donate old computers to the local Goodwill. They’ll clean them up and sell them for a few dollars to someone else. Of course, we have no idea who will be getting the computer, and we don’t know what processes Goodwill puts them through before putting them on the shelf. A determined attacker might be able to retrieve social security numbers, bank logins, and other things that we’d prefer to keep private. As usual, I will wipe the hard drive prior to the donation. This time though, I have some new toys to use: Hyper-V and Kali Linux.

Why Use Hyper-V and Kali Linux to Securely Wipe a Physical Drive?

I am literally doing this because I can. You can easily find any number of other ways to wipe a drive. My reasons:

  • I don’t have any experience with Windows-based apps that wipe drives and didn’t find any freebies that spoke to me
  • I don’t really want to deal with booting this old laptop up to one of those security CDs
  • Kali Linux focuses on penetration testing, but Kali is also the name of the Hindu goddess of destruction. For a bit of fun, do an Internet image search on her, but maybe not around small children. What’s more appropriate than unleashing Kali on a disk you want to wipe?
  • I don’t want to deal with a Kali Live CD any more than I want to use one of the other CD-based tools, nor do I want to build a physical Kali box just for this. I already have Kali running in a virtual machine.
  • It’s very convenient for me to connect an external 2.5″ SATA disk to my Windows 10 system.

So yeah, I’m doing this mostly for fun.

Connect the Drive

I’m assuming that you’ve already got a Hyper-V installation with a Kali Linux guest. If not, get those first.

Since we’re working with a physical drive, you also need a way to physically connect the drive to the Hyper-V host. In my case, I have an old Seagate FreeAgent GoFlex that works perfectly for this. It has an enclosure for a small SATA drive and a detachable USB interface-to-SATA connector. I just pop off their drive and plug into the laptop drive, and voila! I can connect her drive to my PC via USB.

how I connected the hard drive

You might need to come up with some other method, like cracking your case and connecting the cables. Hopefully not.

I plugged the disk into my Windows 10 system, and as expected, it appeared immediately. Next, I went into Disk Management and took the disk Offline.

hard disk management page

I then went into Hyper-V Manager and ensured the Kali guest was running. I opened its settings page to the SCSI Controller page. There, I clicked the Add button.

Adding the hard drive

It created a new logical connection and asked me if I wanted a new VHDX or to connect a physical disk. In this case, the physical disk is what we’re after.

select physical hard disk

After clicking OK, the disk immediately appeared in Kali.

In Kali, open the terminal from the launcher at the left:

Kali Linux terminal launch

Use lsblk to verify that Kali can see your disk. I already had my terminal open so that I could perform a before and after for you:

Kali Linux terminal

Remember that Linux marks the SATA disks in order as sda, sdb, sdc, etc. So, I know that the last disk that it detected is sdb, even if I hadn’t run the before and after.

Use shred to Perform the Wipe

Now that we’ve successfully connected the drive, we only need to perform the wipe. We’ll use the “shred” utility for that purpose. On other distributions, you’d usually need to install that from a repository. Kali already has it waiting for you, of course.

The shred utility has a number of options. Use shred –help to view them all. In my case, I want to view progress and I want to increase the number of passes from the default of 3 to 4. I’ve been told that analog readers can sometimes go as far as three layers deep. Apparently, even that is untrue. It seems a that a single pass will do the trick. However, old paranoia dies hard. So, four passes it is.

I used:

Kali Linux

And then, I found something else to do. As you can imagine, overwriting every spot on a 250GB laptop disk takes quite some time.

Because of the time involved, I needed to temporarily disable Windows 10 sleep mode. Otherwise, Connected Standby would interrupt the process.

disabling sleep mode

After the process completed, I used Hyper-V Manager to remove the disk from the VM. Since I never mounted it in Kali, I didn’t need to do anything special there. After that, I bolted the drive back into the laptop. It’s on its way to its happy new owner, and I don’t need to worry about anyone stealing our information from it.

How to Avoid NTFS Permissions Problems During Hyper-V Live Migration

How to Avoid NTFS Permissions Problems During Hyper-V Live Migration

The title of this article describes the symptoms fairly well. You Live Migrate a virtual machine that’s backed by SMB storage, and the permissions shift in a way that prevents the virtual machine from being used. You’d have to be fairly sharp-eyed to notice before it causes problems, though. I didn’t catch on until virtual machines started failing because the hosts didn’t have sufficient permissions to start them. I don’t have a true fix, meaning that I can’t prevent the permissions from changing. However, I can show you how to eliminate the problem.

The root problem also affects local and Cluster Shared Volume locations, although the default permissions generally prevent blocking problems from manifesting.

I have experienced the problem on both 2012 R2 and 2016. The Hyper-V host causes the problem, so the operating system running on the SMB system doesn’t matter.

Symptom of Broken NTFS Permissions for Hyper-V

I discovered the problem when one of my nodes went down for maintenance and all of its virtual machines crashed. It only affected my test cluster, which I don’t keep a close eye on. That means that I can’t tell you when this became a problem. I do know that this behavior is fairly new (sometime in late 2016 or 1Q/2Q 2017).

Symptom 1: Cluster event logs will fill up with the generic access denied (0x80070005) message.

For example, Hyper-V-VMMS; Event ID 20100:

Hyper-V-High-Availability; Event ID 21502:

You will also have several of the more generic FailoverClustering IDs 1069, 1205, and 1254 and Hyper-V-High-Availability IDs 21102 and 21111 as the cluster service desperately tries to sort out the problem.

Symptom 2: Virtual machines disappear from Hyper-V Manager on all nodes while still appearing in Failover Cluster Manager.

Because the cluster can’t register the virtual machine ID on the target Hyper-V host, you won’t see it in Hyper-V Manager. The cluster still knows about it though. Remember that, even if they’re named the same, the objects that you see as Roles in Failover Cluster Manager are different objects than what you see in Hyper-V Manager. Don’t panic! As long as the cluster still knows about the objects, it can still attempt to register them once you’ve addressed the underlying problem.

What Happened?

I’m guessing that “helper” behavior gone awry has caused unintentional problems. When you Live Migrate a virtual machine, Hyper-V tries to “fix” permissions, even when they’re not broken. It adjusts the NTFS permissions for the host.

The GUI ACL looks like this:

broken ntfs settings

The permission level that I set, and that I counsel everyone to set, is Full Control. As you can see, it’s been reduced. We click Advanced as the first investigative step and see:

broken ntfs advanced settings

The Access still only tells us Special, but we can see that inheritance did not cause this. Whatever changes the permissions is making the changes directly on this folder. This is the same folder that’s shared via SMB. Double-clicking the entry and then clicking the Show advanced permissions link at the right shows us the new permission set:

broken ntfs new permissions

When I first found the permissions in this condition, I thought, “Huh, I wonder why/when I did that?” Then I set Full Control again. After the very next Live Migration, these permissions were back! Once I discovered that behavior, I tested other Live Migration types, such as using Cluster Shared Volumes. It does occur on those as well. However, the default permissions on CSVs have other entries that ensure that this particular issue does not prevent virtual machines from functioning. VMs on SMB shares don’t automatically have that kind of luck — but they can benefit from a similar configuration.

Permanently Correcting Live Migration NTFS Permission Problems

I don’t know why Hyper-V selects these particular permissions. I don’t know precisely which of those unchecked boxes cause these problems.

I do know how to prevent the problem from adversely affecting your virtual machines. In fact, even in the absence of the problem, I would label this as a “best practice” because it reduces overall administrative effort.

  1. In Active Directory (I’ll use Active Directory Users and Computers; you could also use PowerShell), create a new security group. For my test environment, I call mine “Hyper-V Hosts”. In a larger domain, you’ll likely want more granular groups.
    broken ntfs group
  2. Select all of the Hyper-V hosts that you want in that new group. Right-click them and click Add to group.
    brokenntfs_hostlistadd
  3. In the Select Groups dialog, enter or browse to the group that you just created. Click OK to add them.
    broken ntfs group
  4. Restart the Workstation service on each of the Hyper-V hosts.
  5. On the target SMB system, add the new group to the ACL of the folder at the root of the share. I personally recommend that you change both SMB and NTFS permissions, although the problem only manifests on NTFS. Grant the group Full Control.
    broken ntfs virtual machines

You will now be able to Live Migrate and start virtual machines from this SMB share. If your virtual machines disappeared from Hyper-V Manager, use Failover Cluster Manager to start and/or Live Migrate them. It will take care of any missing registrations.

Why Does this Work?

Through group permissions, the same object can effectively appear multiple times in a single NTFS ACL (access control list). When that happens, NTFS grants the least restrictive set of permissions. So, while the SVHV1’s specific ACE (access control entry) excludes Write attributes, the Hyper-V Hosts group’s ACE includes it. When NTFS accumulates all possible permissions that could apply to SVHV1, it will find an Allow entry for the Write attributes property (and others not set on ACE specific to SVHV1). If it found a Deny anywhere, that would override any conflicting Allow. However, there are no Deny settings, so that single Allow wins.

Do remember that when a computer accesses an NTFS folder through an SMB share, the permissions on that share must be at least as permissive as NTFS in order for access to work as expected. So, if the SMB permission only allows Read, then it won’t matter that the NTFS allows Full Control. When NTFS permissions and SMB permissions must be evaluated together, the most restrictive cumulative effect applies. I’m mostly telling you this for completeness; Hyper-V will not modify SMB permissions. If they worked before, they’ll continue to work. However, I do recommend that you add the same group with Full Control permissions to the share.

As I mentioned before, I recommend that you adopt the group membership tactic whether you need it or not. When you commission new Hyper-V hosts, you’ll only need to add them to the appropriate groups for SMB access to work automatically. When you decommission servers, you won’t need to go around cleaning up broken SID ACEs.

How to Optimize Hyper-V Performance for Dell PowerEdge T20

How to Optimize Hyper-V Performance for Dell PowerEdge T20

 

A little while back, we published an eBook detailing how to build an inexpensive Hyper-V cluster. At that price point, you’re not going to find anything that breaks performance records. Such a system could meet the needs of a small business, though. For those of you lucky enough to have a more substantial budget, it also works well as a cheap test lab. Whatever your usage, the out-of-box performance can be improved.

The steps in this article were written using the hardware in the previously linked eBook. If you have a Dell T20 that uses a different build, you may not have access to the same options. You may also need to look elsewhere for guidance on configuring additional hardware that I do not have.

A little upfront note: Never expect software or tweaks to match better hardware. If you expect a few switches and tips to turn a T20 into competition for a latest generation PowerEdge R-series, you will leave disappointed. I am always amazed by people that buy budget hardware and then get upset because it acts like budget hardware. If you need much better performance, break out your wallet and buy better hardware.

Step 1: Disable C-States

The number one thing you should always do on all systems to improve Hyper-V performance: disable C-States. You make that change in the system’s BIOS. The T20’s relevant entry appears below. Just clear the box.

t20perf_cstates

I also recommend that you disable SpeedStep, although you probably won’t gain much by doing so.

Step 2: Update Drivers

I know, I know, updating drivers is the oldest of all so-called “performance enhancement” cliches. Bear with me, though. All of the hardware works just fine with Windows default drivers, but the drivers unlock some options that you’ll need.

Start at https://support.dell.com. You’ll be asked for the system’s service tag. At an elevated PowerShell prompt, enter gwmi win32_bios and look at the SerialNumber line:

t20perf_servicetag

Highlight and press [Enter] to copy it to the clipboard.

Select the Drivers and Downloads tab, then locate the Change OS link so that you can select the correct operating system. Dell periodically changes their support site, so you may something different, but these named options have been the same for a while:

t20perf_driversystem

Items that you want:

  • BIOS (reboots without asking; stop your VMs first)
  • Chipset
  • Intel(R) Management Engine Components Installer
  • Intel Rapid Storage Technology Driver and Management Console
  • Intel Rapid Storage Technology F6 Driver

After gathering those files, go to Intel’s support site: https://downloadcenter.intel.com/.

This site also changes regularly. What I did was search for the “I217-LM”. On its list of downloads, I found the Intel Ethernet Adapter Connections CD. That includes drivers for just about every Intel network adapter in existence. If you have the system build that I described in the eBook, this file will update the onboard adapter and the add-in PRO/1000 PTs (and any other Intel network adapters that you might have chosen).

If you’re targeting a GUI-less system, unblock the files. An example:

If you prefer the mouse, then you can use each item’s individual property dialog instead.

Also make sure that you use a GUI system to unzip the Intel CD prior to attempting to install on a GUI-less system.

I’m sure you can install drivers without my help. Do that and read on.

Step 3: Networking Performance Tweaks

Three things you want to do for networking:

  1. Enable jumbo frames for storage adapters
  2. Disable power management
  3. Disable VMQ on any team adapters

Enabling Jumbo Frames

First, make sure that jumbo frames are enabled on your physical switch. It’s always OK for a system to use smaller frames on equipment that has larger frames enabled. The other way around usually just causes excessive fragmentation. That hurts performance, but things still work. Sometimes, it causes Ethernet frames to never be delivered. Always configure your switch first. Many require a power cycle for the change to take effect.

Once jumbo frames are set on your switch, make the change on the T20’s physical adapters. You can make the change in the GUI or in PowerShell.

Enabling Jumbo Frames via the GUI

  1. In Network Connections, access an Intel adapter’s property sheet.
  2. Click the Configure button.
  3. Switch to the Advanced tab.
  4. Set Jumbo Packet to its highest number; usually 9014.

When you install the Intel network drivers and management pack, the I217-LM driver page will look like the following:

t20perf_i217jumbo

Intel adapters not under management will look like this:

t20perf_regularjumbo

Enabling Jumbo Frames in PowerShell

PowerShell makes this fast and simple:

Disabling Network Adapter Power Management

Windows likes to turn off network adapters. Unfortunately, it doesn’t always do the best job ensuring that you’re not still using it. You can disable power management using the GUI or PowerShell.

Disabling Network Adapter Power Management in the GUI

Navigate to the adapter’s properties like you did to enable jumbo frames. This time, go to the Power Management tab. For a device under the control of the Intel management system, just uncheck Reduce link speed during system idle.

t20perf_i217speedreduce

For adapters using default drivers, uncheck Allow the computer to turn off this device to save power:

t20perf_regularnetpm

Disabling Network Adapter Power Management in PowerShell

The process is a bit more involved in PowerShell, but I’ve done the hard work for you. Just copy/paste into an elevated PowerShell prompt or run as a script:

Disable VMQ on Team Adapters

None of the adapters included with this system or in the eBook build support VMQ. That’s good because I don’t know of any manufacturers that properly implement VMQ on gigabit adapters. However, if you create a native Microsoft LBFO team, VMQ will be enabled on it. Whether or not it does anything… I don’t know. I do know that I seemed to clear up some strange issues when I disabled it on 2012. So, I’ve been doing it ever since. It’s quick and easy, so even if it doesn’t help, it certainly won’t hurt.

Note: If you are using the build from the eBook, only follow this section on the Hyper-V hosts. The storage server won’t use VMQ anyway.

Disabling VMQ on Team Adapters Using the GUI

Find the team adapter in Network Connections. It should be quite obvious, since the icon shows two physical adapters. Its description field will say Microsoft Network Adapter Multiplexor Driver.

t20perf_teamadapter

Open it up and get into its adapter configuration properties just as you did for the physical adapters above. Switch to the Advanced tab. Find the Virtual Machine Queues entry and set it to Disabled:

t20perf_teamadapterkillvmq

Disabling VMQ on Team Adapters in PowerShell

PowerShell can make short work of this task:

Step 4: Storage Performance Tweaks

The disks in these systems are slow. Nothing will change that. But, we can even out their transfer patterns a bit.

Changing the Disk Cache Mode for the Hyper-V Hosts

The Hyper-V hosts don’t do a great deal of disk I/O. In my personal configuration, I do place my domain controllers locally. However, for any domain these systems could reasonably handle, the domain controllers will perform very little I/O. We’ll enable the read cache on these systems. It will help, but you may not see much improvement due to their normal usage pattern.

Note: I have not attempted any of this on a GUI-less system. If the graphical interface works, you’ll find its exe at: “C:Program FilesIntelIntel(R) Rapid Storage TechnologyIAStorUI.exe”.

Under the Intel Start menu entry, open Intel® Rapid Storage Technology. Switch to the Performance tab. You could disable the Link Power Management. Its not going to help much on the Hyper-V hosts. Change the Cache mode to Read only.

t20perf_hvstoragecache

Changing the Disk Cache Mode for the Storage Host

The storage server does most of the heavy lifting in this build. We can set some stronger caching policies that will help its performance.

Warning: These steps are safe only if you have a battery backup system that will safely shut down the system in the event of a power outage. As shipped, these systems do not have an internal battery backup for the RAID arrays. You can purchase add-on cards that provide that functionality. My system has one external battery that powers all three hosts. However, its USB interface connects only to the storage system. Do not follow these steps for your Hyper-V hosts unless you have a mechanism to gracefully shut them down in a power outage.

Follow the same steps to access the Intel® Rapid Storage Technology‘s Performance tab as you did on the Hyper-V hosts. This time, disable the power management option, enable write-cache buffer flushing, and set the cache mode to Write back:

t20perf_batterystoragecache

Microsoft’s Tuning Guide

At this point, you’ve made the best improvements that you’re likely to get with this hardware. However, Microsoft publishes tuning guides that might give you a bit more.

Access the 2016 tuning guide: https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/index

The 2016 guide doesn’t contain very many instructions to follow; it contains a great deal of information. Aside from changing the power plan to High Performance, you won’t find much to do.

The 2012 R2 guide contains more activities: https://msdn.microsoft.com/en-us/library/windows/hardware/dn567657(v=vs.85).aspx. I do not know how many of these settings are still honored in 2016. I do know that any further changes that you make based on this guide involve trade-offs. For instance, you can disable the I/O balancer; that might speed up I/O for one VM that feels slow, but at the cost of allowing storage bottlenecks.

Test

After any performance change, test things out. You shouldn’t expect to see any Earth-shattering improvements. You definitely don’t want things to become worse. If any issues occur, retrace your steps and undo changes until performance returns — if it returns. It’s not uncommon for performance tweaking to uncover failing hardware. It’s best to carry out these changes soon after you spin up your new equipment for that reason.

Testing Jumbo Frames

Verify that jumbo frame works by pinging a target IP connected via physical switch using the following form:

If pings drop (but normal pings go through) or you receive a message that says, “Packet needs to be fragmented but DF set.”, then something along the way does not support jumbo frames.

The “8000” number doesn’t need to be exact, but it must be large enough to ensure that you are sending a jumbo Ethernet frame (somewhere in the 6000s and over). Due to variances in the way that “jumbo” can be calculated, the displayed “9014” will almost never work. Usually, Windows will send an unfragmented ping no larger than 8972 bytes.

Verify Settings After Driver Updates

Some driver updates return settings to defaults. You might lose jumbo frames and power management settings, for instance. It’s tempting to automate driver settings, but network setting changes cause network transmission interrupts. You’re better off performing manual verification.

 

4 Ways to Transfer Files to a Linux Hyper-V Guest

4 Ways to Transfer Files to a Linux Hyper-V Guest

You’ve got a straightforward problem. You have a file on your Windows machine. You need to get that file into your Linux machine. Your Windows machine runs Hyper-V, and Hyper-V runs your Linux machine as a guest. You have many options.

Method 1) Use PowerShell and Integration Services

This article highlights the PowerShell technique as it’s the newest method, and therefore the least familiar. You’ll want to use this method when the Windows system that you’re working from hosts the target Linux machine. I’ll provide a longer list of the benefits of this method after the how-to.

Prerequisite for Copying a File into a Linux Guest: Linux Integration Services

The PowerShell method that I’m going to show you makes use of the Linux Integration Services (LIS). It doesn’t work on all distributions/versions. Check for your distribution on TechNet. Specifically, look for “File copy from host to guest”.

By default, Hyper-V disables the particular service that allows you to transfer files directly into a guest.

Enabling File Copy Guest Service in PowerShell

The cmdlet to use is Enable-VMIntegrationService. You can just type it out:

The Name parameter doesn’t work with tab completion, however, so you need to know exactly what to type in order to use that syntax.

You can use Get-VMIntegrationService for spelling assistance:

Enable-VMIntegrationService includes a VMIntegrationService parameter that accepts an object, which can be stored in a variable or piped from Get-VMIntegrationService:

You could leave out the entire where portion and pipe directly in order to enable all services for the virtual machine in one shot.

Use whatever method suits you best. You do not need to power cycle the virtual machine or make any other changes.

Enabling File Copy Guest Service in Hyper-V Manager or Failover Cluster Manager

If you’d prefer to use a GUI, either Hyper-V Manager or Failover Cluster Manager can help. To file copy for a guest in Hyper-V Manager or Failover Cluster Manager, open the Settings dialog for the virtual machine. It does not matter which tool you use. The virtual machine can be On or Off, but it cannot be Saved or Paused.

In the dialog, switch to the Integration Services tab. Check the box for Guest services and click OK.

lincopy_servicescheck

You do not need to power cycle the virtual machine or make any other changes.

Verifying the Linux Guest’s File Copy Service

You can quickly check that the service in the guest is prepared to accept a file from the host:

Look in the output for hypervfcopyd:

lincopy_serviceverify

Of course, you can supply more of the name to grep than just “hyper” to narrow it down, but this is easier to remember.

Using Copy-VMFile to Transfer a File into a Linux Guest

All right, now the prerequisites are out of the way. Use Copy-VMFile:

You can run Copy-VMFile remotely:

Notice that SourcePath must be from the perspective of ComputerName. Tab completion won’t work remotely, so you’ll need to know the precise path of the source file. It might be easier to use Enter-PSSession first so that tab completion will work.

You can create a directory on the Linux machine when you copy the file:

CreateFullPath can only create one folder. If you ask it to create a directory tree (ex: -CreateFullPath '/downloads/new' ), you’ll get an error that includes the text “failed to initiate copying files to the guest: Unspecified error (0x80004005)“.

Benefits and Notes on Using Copy-VMFile for Linux Guests

Some reasons to choose Copy-VMFile over alternatives:

  • I showed you how to use it with the VMName parameter, but Copy-VMFile also accepts VM objects. If you’ve saved the output into a variable from Get-VM or some other cmdlet that produces VM objects, you can use that variable with Copy-VMFile’s VM parameter instead of VMName.
  • The VMName and VM parameters accept arrays, so you can copy a file into multiple virtual machines simultaneously.
  • You do not need a functioning network connection within the Linux guest or between the host and the guest.
  • You do not need to open firewalls or configure any daemons inside the Linux guest.
  • The transfer occurs over the VMBus, so only your hardware capabilities can limit its speed.
  • The transfer operates under the root account, so you can place a file just about anywhere on the target system.

Notes:

  • As mentioned in the list item in the preceding list, this process runs as root. Be careful what you copy and where you place it.
  • Copied files are marked as executable for some reason.
    lincopy_executable
  • Copy-VMFile only works from host to guest. The existence of the FileSource parameter implies that you copy files the other direction, but that parameter accepts no value other than Host.

Method 2) Using WinSCP

I normally choose WinSCP for moving files to/from any Linux machine, Hyper-V guest or otherwise.

If you choose the SCP protocol when connecting to a Linux system, it will work immediately. You won’t need to install any packages first:

lincopy_scp

Once connected, you have a simple folder display for your local and target machines with simple drag and drop transfer functionality:

lincopy_winscpmain

You can easily modify the permissions and execute bit on a file (as long as you have permission):

lincopy_fileprops

You can use the built-in editor on a file or attach it to external editors. It will automatically save the output from those editors back to the Linux machine:

lincopy_edit

You can even launch a PuTTY session right from WinSCP (if PuTTY is installed):

lincopy_putty

I still haven’t found all of the features of WinSCP.

Method 3) Move Files to/from Linux with the Windows FTP Client

Windows includes a command-line ftp client. It has many features, but still only qualifies as barely more than rudimentary. You can invoke it with something like:

The above will attempt to connect to the named host and will then start an interactive session. If you’d like to start work from within an interactive session, that would look something like this:

Use ftp /?  at the command prompt for command-line assistance and help at the interactive ftp > prompt for interactive assistance.

You’ll have a few problems using this or any other standard FTP client: most Linux distributions do not ship with any FTP daemon running. Most distributions allow you to easily acquire vsftpd. I don’t normally do that because SCP is already enabled and it’s secure.

Method 4) Move Files Between Linux Guests with a Transfer VHDX

If you have a distribution that doesn’t work with Copy-VMFile, or you just don’t want to use it, you can use a portable VHDX file instead.

  1. First, create a disk. Use PowerShell so that the sparse files don’t cause the VHDX file to grow larger than necessary:

  1. Attach the VHDX to the Linux guest. If you attach to the virtual SCSI chain, you don’t need to power down the VM.
  2. Inside the Linux guest, create an empty mount location.
  3. Determine which of the attached disks can be used for transfer with sudo fdisk -l . You are looking for a /dev/sd* item that has FAT32 partition information.
    Do not use:
    lincopy_diskinuse
    Use:
    lincopy_disknotinuse
  4. Enter the following as shown. Outputs will show you what you’re doing; I’m only telling you what to type:
  5. Run sudo fdisk -l to verify that your new disk now has a W95 FAT32 partition. You need FAT32 as it’s the only file system that both Linux and Windows can use without extra effort that’s not worth it for a transfer disk.
  6. Format your new partition:

You have successfully created your transfer disk.

Use a Transfer Disk in Linux

To use a transfer disk on the Linux side, you need to attach it to the Linux machine. Then you need to mount it:

  1. Use sudo fdisk -l to verify which device Linux has assigned the disk to. Use the preceding section for hints.
  2. Once you know which device it is, mount it to your transfer mount point: mount /dev/sdb1 /transfer.  Move/copy files into/out of the /transfer folder.
  3. Once you’re finished, unmount the disk from the folder:

    or
  4. Detach the VHDX from the virtual machine.

Use a Linux Transfer Disk in Windows

You mount a VHDX in Windows via Mount-VHD (must be running Hyper-V), Mount-DiskImage, or Disk Management. Once mounted, work with it as you normally would. Mount-VHD and Disk Management will attach it to a unique drive letter; Mount-DiskImage will mount to the empty path that you specify. Once you’re finished working with it, you can use Dismount-VHD, Dismount-DiskImage (don’t forget -Save!), or Disk Management.

Be aware that even though Windows should have no trouble reading a FAT32 partition/volume created in Linux, the opposite is not true! Do not use Windows formatting tools for a Linux transfer disk! Your mileage may vary, but formatting in Linux always works, so stick to that method.

 

Hyper-V Backup Best Practices: Terminology and Basics

Hyper-V Backup Best Practices: Terminology and Basics

 

One of my very first jobs performing server support on a regular basis was heavily focused on backup. I witnessed several heart-wrenching tragedies of permanent data loss but, fortunately, played the role of data savior much more frequently. I know that most, if not all, of the calamities could have at least been lessened had the data owners been more educated on the subject of backup. I believe very firmly in the value of a solid backup strategy, which I also believe can only be built on the basis of a solid education in the art. This article’s overarching goal is to give you that education by serving a number of purposes:

  • Explain industry-standard terminology and how to apply it to your situation
  • Address and wipe away 1990s-style approaches to backup
  • Clearly illustrate backup from a Hyper-V perspective

Backup Terminology

Whenever possible, I avoid speaking in jargon and TLAs/FLAs (three-/four-letter acronyms) unless I’m talking to a peer that I’m certain has the experience to understand what I mean. When you start exploring backup solutions, you will have these tossed at you rapid-fire with, at most, brief explanations. If you don’t understand each and every one of the following, stop and read those sections before proceeding. If you’re lucky enough to be working with an honest salesperson, it’s easy for them to forget that their target audience may not be completely following along. If you’re less fortunate, it’s simple for a dishonest salesperson to ridiculously oversell backup products through scare tactics that rely heavily on your incomplete understanding.

  • Backup
  • Full/incremental/differential backup
  • Delta
  • Deduplication
  • Inconsistent/crash-consistent/application-consistent
  • Bare-metal backup/restore (BMB/BMR)
  • Disaster Recovery/Business Continuity
  • Recovery point objective (RPO)
  • Recovery time objective (RTO)
  • Retention
  • Rotation — includes terms such as Grandfather-Father-Son (GFS)

There are a number of other terms that you might encounter, although these are the most important for our discussion. If you encounter a vendor making up their own TLAs/FLAs, take a moment to investigate their meaning in comparison to the above. Most are just marketing tactics — inherently harmless attempts by a business entity trying to turn a coin by promoting its products. Some are more nefarious — attempts to invent a nail for which the company just conveniently happens to provide the only perfectly matching hammer (with an extra “value-added” price, of course).

Backup

This heading might seem pointless — doesn’t everyone know what a backup is? In my experience, no. In order to qualify as a backup, you must have a distinct, independent copy of data. A backup cannot have any reliance on the health or well-being of its source data or the media that contains that data. Otherwise, it is not a true backup.

Full/Incremental/Differential Backups

Recent technology changes and their attendant strategies have made this terminology somewhat less popular than in past decades, but it is still important to understand because it is still in widespread use. They are presented in a package because they make the most sense when compared to each other. So, I’ll give you a brief explanation of each and then launch into a discussion.

  • Full Backups: Full backups are the easiest to understand. They are a point-in-time copy of all target data.
  • Differential Backups: A differential backup is a point-in-time copy of all data that is different from the last full backup that is its parent.
  • Incremental Backups: An incremental backup is a point-in-time copy of all data that is different from the backup that is its parent.

The full backup is the safest type because it is the only one of the three that can stand alone in any circumstances. It is a complete copy of whatever data has been selected.

Full Backup

Full Backup

A differential backup is the next safest type. Remember the following:

  • To fully restore the latest data, a differential backup always requires two backups: the latest full backup and the latest differential backup. Intermediary differential backups, if any exist, are not required.
  • It is not necessary to restore from the most recent differential backup if an earlier version of the data is required.
  • Depending on what data is required and the intelligence of the backup application, it may not be necessary to have both backups available to retrieve specific items.

The following is an illustration of what a differential backup looks like:

Differential Backup

Differential Backup

Each differential backup goes all the way back to the latest full backup as its parent. Also, notice that each differential backup is slightly larger than the preceding differential backup. This phenomenon is conventional wisdom on the matter. In theory, each differential backup contains the previous backup’s changes as well as any new changes. In reality, it truly depends on the change pattern. A file backed up on Monday might have been deleted on Tuesday, so that part of the backup certainly won’t be larger. A file that changed on Tuesday might have had half its contents removed on Wednesday, which would make that part of the backup smaller. A differential backup can range anywhere from essentially empty (if nothing changed) to as large as the source data (if everything changed). Realistically, you should expect each differential backup to be slightly larger than the previous.

The following is an illustration of an incremental backup:

Incremental Backup

Incremental Backup

Incremental backups are best thought of as a chain. The above shows a typical daily backup in an environment that uses a weekly full with daily incrementals. If all data is lost and restoring to Wednesday’s backup is necessary, then every single night’s backup from Sunday onward will be necessary. If any one is missing or damaged, then it will likely not be possible to retrieve anything from that backup or any backup afterward. Therefore, incremental backups are the riskiest; they are also the fastest and consume the least amount of space.

Historically, full/incremental/differential backups have been facilitated by an archive bit in Windows. Anytime a file is changed, Windows sets its archive bit. The backup types operate with this behavior:

  • A full backup captures all target files and clears any archive bits that it finds.
  • A differential backup captures only target files that have their archive bit set and it leaves the bit in the state that it found it.
  • An incremental backup captures only files with the archive bit set and clears it afterward.
Archive Bit Example

Archive Bit Example

Delta

“Delta” is probably the most overthought word in all of technology. It means “difference”. Do not analyze it beyond that. It just means “difference”. If you have $10 in your pocket and you buy an item for $8, the $8 dollars that you spent is the “delta” between the amount of money that you had before you made the purchase and the amount of money that you have now.

The way that vendors use the term “delta” sometimes changes, but usually not by a great deal. In the earliest incarnation that I am aware of “delta” as applied to backups, it meant intra-file changes. All previous backup types operated with individual files being the smallest level of granularity (not counting specialty backups such as Exchange item-level). Delta backups would analyze the blocks of individual files, making the granularity one step finer.

The following image illustrates the delta concept:

Delta Backup

Delta Backup

A delta backup is essentially an incremental backup, but at the block level instead of the file level. Somebody got the clever idea to use the word “delta”, probably so that it wouldn’t be confused with “differential”, and the world thought it must mean something extra special because it’s Greek.

The major benefit of delta backups is that they use much less space than even incremental backups. The trade-off is in the computing power to calculate deltas. The archive bit can tell it if a file needs to be scanned, but it cannot tell it which blocks to cover. Backup systems that perform delta operations require some other method for change tracking.

Deduplication

Deduplication represents the latest iteration of backup innovation. The term explains itself quite nicely. The backup application searches for identical blocks of data and reduces them to a single copy.

Deduplication involves three major feats:

  • The algorithm that discovers duplicate blocks must operate in a timely fashion
  • The system that tracks the proper location of duplicated blocks must be foolproof
  • The system that tracks the proper location of duplicated blocks must use significantly less storage than simply keeping the original blocks

So, while deduplication is conceptually simple, implementations can depend upon advanced computer science.

Deduplication’s primary benefit is that it can produce backups that are even smaller than delta systems. Part of that will depend on the overall scope of the deduplication engine. If you were to run fifteen new Windows Server 2016 virtual machines through even a rudimentary deduplicator, it would reduce all of them to the size of approximately a single Windows Server 2016 virtual machine — a 93% savings.

There is risk in overeager implementations, however. With all data blocks represented by a single copy, each block becomes a single point of failure. The loss of a single vital block could spell disaster for a backup set. This risk can be mitigated by employing a single pre-existing best practice: always maintain multiple backups.

Inconsistent/Crash-Consistent/Application-Consistent

We already have an article set that explores these terms in some detail. Quickly:

  • Inconsistent backups would be effectively the same thing as performing a manual file copy of a directory tree.
  • Crash-consistent backup captures data as it sits on the storage volume at a given point in time, but cannot touch anything passing through the CPU or waiting in memory. You could lose any in-flight I/O operations.
  • Application-consistent backup coordinates with the operating system and, where possible, individual applications to ensure that in-flight I/Os are flushed to disk so that there are no active file changes at the moment that the backup is taken

I occasionally see people twisting these terms around, although I believe that’s most accidental. The definitions that I used above have been the most common, stretching back into the 90s. Be aware that there are some disagreements, so ensure that you clarify terminology with any salespeople.

Bare-Metal Backup/Restore

A so-called “bare-metal backup” and/or “bare metal restore” involves capturing the entirety of a storage unit including metadata portions such as the boot sector. These backup/restore types essentially mean that you could restore data to a completely empty physical system without needing to install an operating system and/or backup agent on it first.

Disaster Recovery/Business Continuity

The terms “Disaster Recovery” (DR) and “Business Continuity” are often used somewhat interchangeably in marketing literature. “Disaster Recovery” is the older term and more accurately reflects the nature of the involved solutions. “Business Continuity” is a newer, more exciting version that sounds more positive but mostly means the same thing. These two terms encompass not just restoring data, but restoring the organization to its pre-disaster state. “Business Continuity” is used to emphasize the notion that, with proper planning, disasters can have little to no effect on your ability to conduct normal business. Of course, the more “continuous” your solution is, the higher your costs are. That’s not necessarily a bad thing, but it must be understood and expected.

One thing that I really want to make very clear about disaster recovery and/or business continuity is that these terms extend far beyond just backing up and restoring your data. DR plans need to include downtime procedures, phone trees, alternative working sites, and a great deal more. You need to think all the way through a disaster from the moment that one occurs to the moment that everything is back to some semblance of normal.

Recovery Point Objective

The maximum acceptable span of time between the latest backup and a data loss event is called a recovery point objective (RPO). If the words don’t sound very much like their definition, that’s because someone worked really hard to couch a bad situation within a somewhat neutral term. If it helps, the “point” in RPO means “point in time.” Of all data adds and changes, anything that happens between backup events has the highest potential of being lost. Many technologies have some sort of fault tolerance built in; for instance, if your domain controller crashes and it isn’t due to a completely failed storage subsystem, you’re probably not going to need to go to backup. Most other databases can tell a similar story. RPOs mostly address human error and disaster. More common failures should be addressed by technology branches other than backup, such as RAID.

A long RPO means that you are willing to lose a greater span of time. A daily backup gives you a 24-hour RPO. Taking backups every two hours results in a 2-hour RPO. Remember that an RPO represents a maximum. It is highly unlikely that a failure will occur immediately prior to the next backup operation.

Recovery Time Objective

Recovery time objective (RTO) represents the maximum amount of time that you are willing to wait for systems to be restored to a functional state. This term sounds much more like its actual meaning than RPO. You need to take extra care when talking with backup vendors about RTO. They will tend to only talk about RTO in terms of restoring data to a replacement system. If your primary site is your only site and you don’t have a contingency plan for complete building loss, your RTO is however long it takes to replace that building, fill it with replacement systems, and restore data to those systems. Somehow, I suspect that a six-month or longer RTO is unacceptable for most institutions. That is one reason that DR planning must extend beyond taking backups.

In more conventional usage, RTOs will be explained as though there is always a target system ready to receive the restored data. So, if your backup drives are taken offsite to a safety deposit box by the bookkeeper when she comes in at 8 AM, your actual recovery time is essentially however long it takes someone to retrieve the backup drive plus the time needed to perform a restore in your backup application.

Retention

Retention is the desired amount of time that a backup should be kept. This deceptively simple description hides some complexity. Consider the following:

  • Legislation mandates a ten-year retention policy on customer data for your industry. A customer was added in 2007. Their address changed in 2009. Must the customer’s data be kept until 2017 or 2019?
  • Corporate policy mandates that all customer information be retained for a minimum of five years. The line-of-business application that you use to record customer information never deletes any information that was placed into it and you have a copy of last night’s data. Do you need to keep the backup copy from five years ago or is having a recent copy of the database that contains five-year-old data sufficient?

Questions such as these can plague you. Historically, monthly and annual backup tapes were simply kept for a specific minimum number of years and then discarded, which more or less answered the question for you. Tape is an expensive solution, however, and many modern small businesses do not use it. Furthermore, laws and policies only dictate that the data be kept; nothing forced anyone to ensure that the backup tapes were readable after any specific amount of time. One lesson that many people learn the hard way is that tapes stored flat can lose data after a few years. We used to joke with customers that their bits were sliding down the side of the tape. I don’t actually understand the governing electromagnetic phenomenon, but I can verify that it does exist.

With disk-based backups, the possibilities are changed somewhat. People typically do not keep stacks of backup disks lying around, and their ability to hold data for long periods of time is not the same as backup tape. The rules are different — some disks will outlive tape, others will not.

Rotation

Backup rotations deal with the media used to hold backup information. This has historically meant tape, and tape rotations often came in some very grandiose schemes. One of the most widely used rotations is called “Grandfather-Father-Son” (GFS):

  • One full backup is taken monthly. The media it is taken on is kept for an extended period of time, usually one year. One of these is often considered an annual and kept longer. This backup is called the “Grandfather”.
  • Each week thereafter, on the same day, another full backup is taken. This media is usually rotated so that it is re-used once per month. This backup is known as the “Father”.
  • On every day between full backups, an incremental backup is taken. Each day’s media is rotated so that it is re-used on the same day each week. This backup is known as the “Son”.

The purpose of rotation is to have enough backups to provide sufficient possible restore points to guard against a myriad of possible data loss instances without using so much media that you bankrupt yourself and run out of physical storage room. Grandfathers are taken offsite and placed in long-term storage. Fathers are taken offsite, but perhaps not placed in long-term storage so that they are more readily accessible. Sons are often left onsite, at least for a day or two, to facilitate rapid restore operations.

Replacing Old Concepts with New Best Practices

Some backup concepts are simply outdated, especially for the small business. Tape used to be the only feasible mass storage device that could be written and rewritten on a daily basis and were sufficiently portable. I recall being chastised by a vendor representative in 2004 because I was “still” using tape when I “should” be backing up to his expensive SAN. I asked him, “Oh, do employees tend to react well when someone says, ‘The building is on fire! Grab the SAN and get out!’?” He suddenly didn’t want talk to me anymore.

The other somewhat outdated issue is that backups used to take a very, very long time. Tape was not very fast, disks were not very fast, networks were not very fast. Differential and incremental backups were partly the answer to that problem, and partly to the problem that tape capacity was an issue. Today, we have gigantic and relatively speedy portable hard drives, networks that can move at least many hundreds of megabits per second, and external buses like USB 3 that outrun both of those things. We no longer need all weekend and an entire media library to perform a full backup.

One thing that has not changed is the need for backups to exist offsite. You cannot protect against a loss of a building if all of your data stays in that building. Solutions have evolved, though. You can now afford to purchase large amounts of bandwidth and transmit your data offsite to your alternative business location(s) each night. If you haven’t got an alternative business location, there are an uncountable number of vendors that would be happy to store your data each night in exchange for a modest (or not so modest) sum of money. I still counsel periodically taking an offline offsite backup copy, as that is a solid way to protect your organization against malicious attacks (some of which can be by disgruntled staff).

These are the approaches that I would take today that would not have been available to me a few short years ago:

  • Favor full backups whenever possible — incremental, differential, delta, and deduplicated backups are wonderful, but they are incomplete by nature. It must never be forgotten that the strength of backup lies in the fact that it creates duplicates of data. Any backup technique that reduces duplication dilutes the purpose of backup. I won’t argue against anyone saying that there are many perfectly valid reasons for doing so, but such usage must be balanced. Backup systems are larger and faster than ever before; if you can afford the space and time for full copies, get full copies.
  • Steer away from complicated rotation schemes like GFS whenever possible. Untrained staff will not understand them and you cannot rely on the availability of trained staff in a crisis.
  • Encrypt every backup every time.
  • Spend the time to develop truly meaningful retention policies. You can easily throw a tape in a drawer for ten years. You’ll find that more difficult with a portable disk drive. Then again, have you ever tried restoring from a ten-year-old tape?
  • Be open to the idea of using multiple backup solutions simultaneously. If using a combination of applications and media types solves your problem and it’s not too much overhead, go for it.

There are a few best practices that are just as applicable now as ever:

  • Periodically test your backups to ensure that data is recoverable
  • Periodically review what you are backing up and what your rotation and retention policies are to ensure that you are neither shorting yourself on vital data nor wasting backup media space on dead information
  • Backup media must be treated as vitally sensitive mission-critical information and guarded against theft, espionage, and damage
    • Magnetic media must be kept away from electromagnetic fields
    • Tapes must be stored upright on their edges
    • Optical media must be kept in dark storage
    • All media must be kept in a cool environment with a constant temperature and low humidity
  • Never rely on a single backup copy. Media can fail, get lost, or be stolen. Backup jobs don’t always complete.

Hyper-V-Specific Backup Best Practices

I want to dive into the nuances of backup and Hyper-V more thoroughly in later articles, but I won’t leave you here without at least bringing them up.

  • Virtual-machine-level backups are a good thing. That might seem a bit self-serving since I’m writing for Altaro and they have a virtual-machine-level backup application, but I fit well here because of shared philosophy. A virtual-machine-level backup gives you the following:
    • No agent installed inside the guest operating system
    • Backups are automatically coordinated for all guests, meaning that you don’t need to set up some complicated staggered schedule to prevent overlaps
    • No need to reinstall guest operating systems separately from restoring their data
  • Hyper-V versions prior to 2016 do not have a native changed block tracking mechanism, so virtual-machine-level backup applications that perform delta and/or deduplication operations must perform a substantial amount of processing. Keep that in mind as you are developing your rotations and scheduling.
  • Hyper-V will coordinate between backup applications that run at the virtual-machine-level (like Altaro VM) and the VSS writer(s) within guest Windows operating systems and the integration components within Linux guest operating systems. This enables application-consistent backups without doing anything special other than ensuring that the integration components/services are up-to-date and activated.
  • For physical installations, no application can perform a bare metal restore operation any more quickly than you can perform a fresh Windows Server/Hyper-V Server installation from media (or better yet, a WDS system). Such a physical server should only have very basic configuration and only backup/management software installed. Therefore, backing up the management operating system is typically a completely pointless endeavor. If you feel otherwise, I want to know what you installed in the management operating system that would make a bare-metal restore worth your time, as I’m betting that such an application or configuration should not be in the management operating system at all.
  • Use your backup application’s ability to restore a virtual machine next to its original so that you can test data integrity

Follow-Up Articles

With the foundational material supplied in this article, I intend to work on further posts that expand on these thoughts in greater detail. If you have any questions or concerns about backing up Hyper-V, let me know. Anything that I can’t answer quickly in comments might find its way into an article.

Critical Status in Hyper-V Manager

Critical Status in Hyper-V Manager

 

 

I’m an admitted technophile. I like blinky lights and sleek chassis and that new stuff smell and APIs and clicking through interfaces. I wouldn’t be in this field otherwise. However, if I were to compile a list of my least favorite things about unfamiliar technology, that panicked feeling when something breaks would claim the #1 slot. I often feel that systems administration sits diametrically opposite medical care. We tend to be comfortable learning by poking and prodding at things while they’re alive. When they’re dead, we’re sweating — worried that anything we do will only make the situation worse. For many of us in the Hyper-V world, that feeling first hits with the sight of a virtual machine in “Critical” status.

If you’re there, I can’t promise that the world hasn’t ended. I can help you to discover what it means and how to get back on the road to recovery.

The Various “Critical” States in Hyper-V Manager

If you ever look at the underlying WMI API for Hyper-V, you’ll learn that virtual machines have a long list of “sick” and “dead” states. Hyper-V Manager distills these into a much smaller list for its display. If you have a virtual machine in a “Critical” state, you’re only given two control options: Connect and Delete:

crit_sample

We’re fortunate enough in this case that the Status column gives some indication as to the underlying problem. That’s not always the case. That tiny bit of information might not be enough to get you to the root of the problem.

For starters, be aware that any state that includes the word “Critical” typically means that the virtual machine’s storage location has a problem. The storage device might have failed. The host may not be able to connect to storage. If you’re using SMB 3, the host might be unable to authenticate.

You’ll notice that there’s a hyphen in the state display. Before the hyphen will be another word that indicates the current or last known power state of the virtual machine. In this case, it’s Saved. I’ve only ever seen three states:

  • Off-Critical: The virtual machine was off last time the host was able to connect to it.
  • Saved-Critical: The virtual machine was in a saved state the last time the host was able to connect to it.
  • Paused-Critical: The paused state typically isn’t a past condition. This one usually means that the host can still talk to the storage location, but it has run out of free space.

There may be other states that I have not discovered. However, if you see the word “Critical” in a state, assume a storage issue.

Learning More About the Problem

If you have a small installation, you probably already know enough at this point to go find out what’s wrong. If you have a larger system, you might only be getting started. With only Connect and Delete, you can’t find out what’s wrong. You need to start by discovering the storage location that’s behind all of the fuss. Since Hyper-V Manager won’t help you, it’s PowerShell to the rescue:

Remember to use your own virtual machine name for best results. The first of those two lines will show you all of the virtual machine’s properties. It’s easier to remember in a pinch, but it also displays a lot of fields that you don’t care about. The second one pares the output list down to show only the storage-related fields. My output:

crit_powershellexample

The Status field specifically mentioned the configuration location. As you can see, the same storage location holds all of the components of this particular virtual machine. We are not looking at anything related to the virtual hard disks, though. For that, we need a different cmdlet:

Again, I recommend that you use the name of your virtual machine instead of mine. The first cmdlet will show a table display that includes the path of the virtual hard disk file, but it will likely be truncated. There’s probably enough to get you started. If not, the second shows the entire path.

crit_samplepsharddriveexample

Everything that makes up this virtual machine happens to be on the same SMB 3 share. If yours is on a SCSI target, use iscsicpl.exe to check the status of connected disks. If you’re using Fibre Channel, your vendor’s software should be able to assist you.

Correcting the Problem

In my case, the Server service was stopped on the system that I use to host SMB 3 shares. It got that way because I needed to set up a scenario for this article. To return the virtual machine to a healthy state, I only needed to start that service and wait a few moments.

Your situation will likely be different from mine, of course. Your first goal is to rectify the root of the problem. If the storage is offline, bring it up. If there’s a disconnect, reconnect. After that, simply wait. Everything should take care of itself.

When I power down my test cluster, I tend to encounter this issue upon turning everything back on. I could start my storage unit first, but the domain controllers are on the Hyper-V hosts so nothing can authenticate to the storage unit even if it’s on. I could start the Hyper-V hosts first, but then the storage unit isn’t there to challenge authentication. So, I just power the boxes up in whatever order I come to them. All I need to do is wait — the Hyper-V hosts will continually try to reach storage, and they’ll eventually be successful.

If the state does not automatically return to a normal condition, restart the “Hyper-V Virtual Machine Management” service. You’ll find it by that name in the Services control panel applet. In an elevated PowerShell session:

At an administrative command prompt:

That should clear up any remaining status issues. If it doesn’t, there is still an issue communicating with storage. Or, in the case of the Paused condition, it still doesn’t believe that the location has sufficient space to safely run the virtual machine(s).

Less Common Corrections

If you’re certain that the target storage location does not have issues and the state remains Critical, then I would move on to repairs. Try chkdsk. Try resetting/rebooting the storage system. It’s highly unlikely that the Hyper-V host is at fault, but you can also try rebooting that.

Sometimes, the constituent files are damaged or simply gone. Make sure that you can find the actual .xml (2012 R2 and earlier) or .vmcx (2016 and later) file that represents the virtual machine. Remember that it’s named with the virtual machine’s unique identifier. You can find that with PowerShell:

If the files are misplaced or damaged, your best option is restore. If that’s not an option, then Delete might be your only choice. Delete will remove any remainders of the virtual machine’s configuration files, but will not touch any virtual hard disks that belong to the virtual machine. You can create a new one and reattach those disk files.

Best of luck to you.

Confusing Terms and Concepts in Hyper-V

Confusing Terms and Concepts in Hyper-V

 

 

If I ever got a job at Microsoft, I’d want my title to be “He Who Fixes Stupid Names and Labels”. Depending upon my mood, I envision that working out in multiple ways. Sometimes, I see myself in a product meeting with someone locked in a condescending glare, and asking, “Really? With nearly one million unique words in the English language, we’re going with ‘Core’? Again?” Other times, I see myself stomping around like Gordon Ramsay, bellowing, “This wording is so unintelligible that it could be a plot for a Zero Wing sequel!” So, now you know one of the many reasons that I don’t work for Microsoft. But, the degree of my fitness to work in a team aside, the abundance of perplexing aspects of the Hyper-V product generates endless confusion for newcomers. I’ve compiled a shortlist to help cut through a few of them.

Azure

This particular item doesn’t have a great deal of relevance to Hyper-V for most of us. On the back end, there is a great deal of intersection in the technologies. Site Recovery allows you to replicate your on-premises virtual machines into Azure. But, there’s not a lot of confusion about the technology that I’m aware of. It’s listed here, and first, as an example of what we’re up against. Think about what the word “azure” means. It is the color of a clear, cloudless sky. You think one thing when a salesman walks in and says, “Hi, we’d like to introduce you to our cloud product called ‘Azure’.” That sounds nice, right? What if, instead, he said, “Hi, we’d like to introduce you to our cloud product called ‘Cloudless’.” What?

"Microsoft Drab" Just Doesn't have the Same Ring

“Microsoft Drab” Just Doesn’t have the Same Ring

Azure’s misnomer appears to be benign, as it works and it’s selling very well. I just want you to be aware that, if you’re confused when reading a product label or a dialog box, it’s probably not your fault. Microsoft doesn’t appear to invest many resources in the “Thoroughly Think Through Phrasing” department.

What Should I Call Hyper-V, Anyway?

Some of the confusion kicks in right at the beginning. Most people know that Hyper-V is Microsoft’s hypervisor, which is good. But, then they try to explain what they’re using, and everything immediately goes off the rails.

First, there’s Hyper-V. That part, we all understand. Or, at least we think that we understand. When you just use the word “Hyper-V”, that’s just the hypervisor. It’s completely independent of how you acquired or installed or use the hypervisor. It applies equally to Hyper-V Server, Windows Server with Hyper-V, and Nano Server with Hyper-V.

Second, there’s Client Hyper-V. It’s mostly Hyper-V, but without as many bells and whistles. Client Hyper-V is only found in the client editions of Windows, conveniently enough. So, if you’ve installed some product whose name includes the word “Server”, then you are not using Client Hyper-V. Simple enough, right?

Third, there’s the fictitious “Hyper-V Core”. I’ve been trying to get people to stop saying this for years, but I’m giving up now. Part of it is that it’s just not working. Another part of it:

confuse_hypercore

With Microsoft actively working against me, I don’t like my odds. Sure, they’ve cleaned up a lot of these references, but I suspect they’ll never completely go away.

What I don’t like about the label/name “Hyper-V Core” is that it implies the existence of “Hyper-V not Core”. Therefore, people download Hyper-V Server and want to know why it’s all command-line based. People will also go to the forums and ask for help with “Hyper-V Core”, so then there’s at least one round of, “What product are you really using?”

What Does it Mean to”Allow management operating system to share this network adapter”?

The setting in question appears on the Virtual Switch Manager’s dialog when you create a virtual switch in Hyper-V Manager:

confuse_allow

The corresponding PowerShell parameter for New-VMSwitch is AllowManagementOs.

If I had that job that we were talking about a bit ago, the Hyper-V Manager line would say, “Connect the management operating system to this virtual switch.” The PowerShell parameter would be ConnectManagementOs. Then the labels would be true, explainable, and comprehensible.

Whether you choose the Hyper-V Manager path or the PowerShell route, this function creates a virtual network adapter for the management operating system and attaches it to the virtual switch that you’re creating. It does not “share” anything, at least not in any sense that this phrasing evokes. For more information, we have an article that explains the Hyper-V virtual switch.

I Downloaded and Installed Hyper-V. Where Did My Windows 7/8/10 Go?

I see this question often enough to know that there are a significant number of people that encounter this problem. The trainer in me must impart a vital life lesson: If the steps to install a product include anything like “boot from a DVD or DVD image”, then it is making substantial and potentially irreversible changes.

If you installed Hyper-V Server, your original operating environment is gone. You may not be out of luck, though. If you didn’t delete the volume, then your previous operating system is in a folder called “Windows.old”. Don’t ask me or take this to the Hyper-V forums, though, because this is not a Hyper-V problem. Find a forum for the operating system that you lost and ask how to recover it from the Windows.old folder. There are no guarantees.

Many of the people that find themselves in this position claim that Microsoft didn’t warn them, which is absolutely not true.

The first warning occurs if you attempt to upgrade. It prevents you from doing so and explicitly says what the only other option, “Custom”, will do:

confuse_overwrite1

If you never saw that because you selected Custom first, then you saw this warning:

confuse_overwrite2

That warning might be a bit too subtle, but you had another chance. After choosing Custom, you then decided to either install over the top of what you had or delete a partition. Assuming that you opted to use what was there, you saw this dialog:

confuse_overwrite3

The dialog could use some cleanup to cover the fact that it might have detected something other than a previous installation of Hyper-V Server, but there’s a clear warning that something new is pushing out something old. If you chose to delete the volume so that you could install Hyper-V Server on it, that warning is inescapably blatant:

confuse_overwrite4

If this has happened to you, then I’m sorry, but you were warned. You were warned multiple times.

How Many Hyper-V Virtual Switches Should I Use?

I often see questions in this category from administrators that have VMware experience. Hyper-V’s virtual switch is markedly different from what VMware does, so you should not expect a direct knowledge transfer.

The default answer to this question is always “one”. If you’re going to be putting your Hyper-V hosts into a cluster, that strengthens the case for only one. A single Hyper-V virtual switch performs VLAN isolation and identifies local MAC addresses to prevent superfluous trips to the physical network for intra-VM communications. So, you rarely gain anything from using two or more virtual switches. We have a more thorough article on the subject of multiple Hyper-V switches.

Checkpoint? Snapshot? Well, Which Is it?

To save time, I’m going to skip definitions here. This is just to sort out the terms. A Hyper-V checkpoint is a Hyper-V snapshot. They are not different. The original term in Hyper-V was “snapshot”. That caused confusion with the Volume Shadow Copy Service (VSS) snapshot. Hyper-V’s daddy, “Virtual Server”, used the term “checkpoint”. System Center Virtual Machine Manager has always used the term “checkpoint”. The “official” terms have been consolidated into “checkpoint”. You’ll still find many references to snapshots, such as:

confuse_snaporcheck

But We Officially Don’t Say “Snapshot”

We writers are looking forward to many more years of saying “checkpoint (or snapshot)”.

Do I Delete a Checkpoint? Or Merge It? Or Apply It? Or Something Else? What is Going on Here?

If you’re the person that developed the checkpoint actions, all of these terms make a lot of sense. If you’re anyone else, they’re an unsavory word soup.

  • Delete: “Delete” is confusing because deleting a checkpoint keeps your changes. Coming into this cold, you might think that deleting a checkpoint would delete changes. Just look under the hood, though. When you create a checkpoint, it makes copies of the virtual machine’s configuration files and starts using new ones. When you delete that checkpoint, that tells Hyper-V to delete the copies of the old configuration. That makes more sense, right? Hyper-V also merges the data in post-checkpoint differencing disks back into the originals, then deletes the differencing disks.
  • Merge (checkpoint): When you delete a checkpoint (see previous bullet point), the differencing disks that were created for its attached virtual hard disks are automatically merged back into the original. You can’t merge a checkpoint, though. That’s not a thing. That can’t be a thing. How would you merge a current VM with 2 vCPUs and its previous setting with 4 vCPUs? Split the difference? Visitation of 2 vCPUs every other weekend?
  • Merge (virtual hard disk): First, make sure that you understand the previous bullet point. If there’s a checkpoint, you want to delete it and allow that process to handle the virtual hard disk merging on your behalf. Otherwise, you’ll bring death and pestilence. If the virtual hard disk in question is not related to a checkpoint but still has a differencing disk, then you can manually merge them.
  • Apply: The thought process behind this term is just like the thinking behind Delete. Remember those copies that your checkpoint made? When you apply the checkpoint, the settings in those old files are applied to the current virtual machine. That means that applying a checkpoint discards your changes. As for the virtual hard disks, Hyper-V stops using the differencing disk that was created when the virtual machine was checkpointed and starts using a new differencing disk that is a child of the original virtual hard disk. Whew! Get all of that?
  • Revert: This verb makes sense to everyone, I think. It reverts the current state of the virtual machine to the checkpoint state. Technologically, Hyper-V applies the settings from the old files and discards the differencing disk. It creates a new, empty differencing disk and starts the virtual machine from it. In fact, the only difference between Revert and Apply is the opportunity to create another checkpoint to hold the changes that you’re about to lose. If I had that job, there would be no Apply. There would only be Revert (keep changes in a new checkpoint) and Revert (discard changes).

If this is tough to keep straight, it might make you feel better to know that my generation was expected to remember that Windows boots from the system disk to run its system from the boot disk. No one has ever explained that one to me. When you’re trying to keep this checkpoint stuff straight, just try to think of it from the perspective of the files that constitute a checkpoint.

If you want more information on checkpoints, I happen to like one of my earlier checkpoint articles. I would also recommend searching the blog on the “checkpoint” keyword, as we have many articles written by myself and others.

Dynamic Disks and Dynamically Expanding Virtual Hard Disks

“Dynamically expanding virtual hard disk” is a great big jumble of words that nobody likes to say. So, almost all of us shorten it to “dynamic disk”. Then, someone sees that the prerequisites list for the product that they want to use says, “does not support dynamic disks”. Panic ensues.

Despite common usage, these terms are not synonymous.

With proper planning and monitoring, dynamically expanding hard disks are perfectly safe to use.

Conversely, Dynamic disks are mostly useless. A handful of products require them, but hopefully they’ll all die soon (or undergo a redesign, that could work too). In the absence of an absolute, defined need, you should never use Dynamic disks. The article linked in the previous paragraph explains the Dynamic disk, if you’re interested. For a quicker explanation, just like at this picture from Disk Management:

Basic and Dynamic Disks

Basic and Dynamic Disks

Dynamic disks, in the truest sense of the term, are not a Hyper-V technology.

Which Live Migration Do I Want?

I was attempting to answer a forum question in which the asker was configuring Constrained Delegation so that he could Live Migrate a virtual machine from one physical cluster node to another physical node in the same cluster. I rightly pointed out that nodes in the same cluster do not require delegation. It took a while for me to understand that he was attempting to perform a Shared Nothing Live Migration of an unclustered guest between the two nodes. That does require delegation in some cases.

To keep things straight, understand that Hyper-V offers multiple virtual machine migration technologies. Despite all of them including the word “migration” and most of them including the word “live”, they are different. They are related because they all move something Hyper-V, but they are not interchangeable terms.

This is the full list:

  • Quick Migration: Quick Migration moves a virtual machine from one host to another within a cluster. So it’s said, the virtual machine must be clustered, not simply on a cluster node. It is usually the fastest of the migration techniques because nothing is transmitted across the network. If the virtual machine is on, it is first saved. Ownership is transferred to the target node. If the virtual machine was placed in a saved state for the move, it is resumed.
  • Live Migration: A Live Migration has the same requirement as a Quick Migration: it is only applicable to clustered virtual machines. Additionally, the virtual machine must be turned on (otherwise, it wouldn’t be “live”). Live Migration is slower than Quick Migration because CPU threads, memory, and pending I/O must be transferred to the target host, but it does not involve an interruption in service. The virtual machine experiences no outage except for the propagation of its network adapters’ MAC address change throughout the network.
  • Storage Live Migration: A Storage Live Migration involves the movement of any files related to a virtual machine. It could be all of them, or it could be any subset. “Storage Live Migration” is just a technology name; the phrase never appears anywhere in any of the tools. You select one of the options to “Move” and then you choose to only move storage. You can choose a new target location on the same host or remote storage, but a Storage Live Migration by itself cannot change a virtual machine’s owner to a new physical host. Unlike a “Live Migration”, the “Live” in “Storage Live Migration” is optional.
  • Shared Nothing Live Migration: The “Shared Nothing” part of this term can cause confusion because it isn’t true. The “live” bit doesn’t help, because the VM can be off or saved, if you want. The idea is that the source and destination hosts don’t need to be in the same cluster, so they don’t need to share a common storage pool. Their hosts do need to share a domain and at least one network, though. I’m not sure what I would have called this one, so maybe I’m glad that I don’t have that job. Anyway, as with Storage Live Migration, you’ll never see this phrase in any of the tools. It’s simply one of the “move” options.

If you’re seeking help from others, it’s important to use the proper term. Otherwise, your confusion will become their confusion and you might never find any help.

What Else?

I’ve been doing this long enough that I might be missing other things that just don’t make sense. Let us know what’s boggled you about Hyper-V and we’ll add it to the list.

Disk Fragmentation is not Hyper-V’s Enemy

Disk Fragmentation is not Hyper-V’s Enemy

Fragmentation is the most crippling problem in computing, wouldn’t you agree? I mean, that’s what the strange guy downtown paints on his walking billboard, so it must be true, right? And fragmentation is at least five or six or a hundred times worse for a VHDX file, isn’t it? All the experts are saying so, according to my psychic.

But, when I think about it, my psychic also told me that I’d end up rich with a full head of hair. And, I watched that downtown guy lose a bet to a fire hydrant. Maybe those two aren’t the best authorities on the subject. Likewise, most of the people that go on and on about fragmentation can’t demonstrate anything concrete that would qualify them as storage experts. In fact, they sound a lot like that guy that saw your employee badge in the restaurant line and ruined your lunch break by trying to impress you with all of his anecdotes proving that he “knows something about computers” in the hopes that you’d put in a good word for him with your HR department (and that they have a more generous attitude than his previous employers on the definition of “reasonable hygiene practices”).

To help prevent you from ever sounding like that guy, we’re going to take a solid look at the “problem” of fragmentation.

Where Did All of this Talk About Fragmentation Originate?

Before I get very far into this, let me point out that all of this jabber about fragmentation is utter nonsense. Most people that are afraid of it don’t know any better. The people that are trying to scare you with it either don’t know what they’re talking about or are trying to sell you something. If you’re about to go to the comments section with some story about that one time that a system was running slowly but you set everything to rights with a defrag, save it. I once bounced a quarter across a twelve foot oak table, off a guy’s forehead, and into a shot glass. Our anecdotes are equally meaningless, but at least mine is interesting and I can produce witnesses.

The point is, the “problem” of fragmentation is mostly a myth. Like most myths, it does have some roots in truth. To understand the myth, you must know its origins.

These Aren’t Your Uncle’s Hard Disks

In the dark ages of computing, hard disks were much different from the devices that you know and love today. I’m young enough that I missed the very early years, but the first one owned by my family consumed the entire top of a desktop computer chassis. I was initially thrilled when my father presented me with my very own PC as a high school graduation present. I quickly discovered that it was a ploy to keep me at home a little longer because it would be quite some time before I could afford an apartment large enough to hold its hard drive. You might be thinking, “So what, they were physically bigger. I have a dozen magazines claiming that size doesn’t matter!” Well, those articles weren’t written about computer hard drives, were they? In hard drives, physical characteristics matter.

Old Drives Were Physically Larger

The first issue is diameter. Or, more truthfully, radius. You see, there’s a little arm inside that hard drive whose job it is to move back and forth from the inside edge to the outside edge of the platter and back, picking up and putting down bits along the way. That requires time. The further the distance, the more time required. Even if we pretend that actuator motors haven’t improved at all, less time is required to travel a shorter distance. I don’t know actual measurements, but it’s a fair guess that those old disks had over a 2.5-inch radius, whereas modern 3.5″ disks are closer to a 1.5″ radius and 2.5″ disks something around a 1″ radius. It doesn’t sound like much until you compare them by percentage differences. Modern enterprise-class hard disks have less than half the maximum read/write head travel distance of those old units.

frag-trackdistance

It’s not just the radius. The hard disk that I had wasn’t only wide, it was also tall. That’s because it had more platters in it than modern drives. That’s important because, whereas each platter has its own set of read/write heads, a single motor controls all of the arms. Each additional platter increases the likelihood that the read/write head arm will need to move a meaningful distance to find data between any two read/write operations. That adds time.

Old Drives Were Physically Slower

After size, there’s rotational speed. The read/write heads follow a line from the center of the platter out to the edge of the platter, but that’s their only range of motion. If a head isn’t above the data that it wants, then it must hang around and wait for that data to show up. Today, we think of 5,400 RPM drives as “slow”. That drive of mine was moping along at a meagerly 3,600 RPM. That meant even more time was required to get/set data.

There were other factors that impacted speed as well, although none quite so strongly as rotational speed improvements. The point is, physical characteristics in old drives meant that they pushed and pulled data much more slowly than modern drives.

Old Drives Were Dumb

Up until the mid-2000s, every drive in (almost) every desktop computer used a PATA IDE  or EIDE interface (distinction is not important for this discussion). A hard drive’s interface is the bit that sits between the connecting cable bits and the spinning disk/flying head bits. It’s the electronic brain that figures out where to put data and where to go get data. IDE brains are dumb (another word for “cheap”). They operate on a FIFO (first-in first-out) basis. This is an acronym that everyone knows but almost no one takes a moment to think about. For hard drives, it means that each command is processed in exactly the order in which it was received. Let’s say that it gets the following:

  1. Read data from track 1
  2. Write data to track 68,022
  3. Read data from track 2

An IDE drive will perform those operations in exactly that order, even though it doesn’t make any sense. If you ever wondered why SCSI drives were so much more expensive than IDE drives, that was part of the reason. SCSI drives were a lot smarter. They would receive a list of demands from the host computer, plot the optimal course to satisfy those requests, and execute them in a logical fashion.

In the mid-2000s, we started getting new technology. AHCI and SATA emerged from the primordial acronym soup as Promethean saviors, bringing NCQ (native command queuing) to the lowly IDE interface. For the first time, IDE drives began to behave like SCSI drives. … OK, that’s overselling NCQ. A lot. It did help, but not as much as it might have because…

Operating Systems Take More Responsibility

It wasn’t just hard drives that operated in FIFO. Operating systems started it. They had good excuses, though. Hard drives were slow, but so were all of the other components. A child could conceive of better access techniques than FIFO, but even PhDs struggled against the CPU and memory requirements to implement them. Time changed all of that. Those other components gained remarkable speed improvements while hard disks lagged behind. Before “NCQ” was even coined, operating systems learned to optimize requests before sending them to the IDE’s FIFO buffers. That’s one of the ways that modern operating systems manage disk access better than those that existed at the dawn of defragmentation, but it’s certainly not alone.

This Isn’t Your Big Brother’s File System

The venerated FAT file system did its duty and did it well. But, the nature of disk storage changed dramatically, which is why we’ve mostly stopped using FAT. Now we have NTFS, and even that is becoming stale. Two things that it does a bit better than FAT is metadata placement and file allocation. Linux admins will be quick to point out that virtually all of their file systems are markedly better at preventing fragmentation than NTFS. However, most of the tribal knowledge around fragmentation on the Windows platform sources from the FAT days, and NTFS is certainly better than FAT.

Some of Us Keep Up with Technology

It was while I owned that gigantic, slow hard drive that the fear of fragmentation wormed its way into my mind. I saw some very convincing charts and graphs and read a very good spiel and I deeply absorbed every single word and took the entire message to heart. That was also the same period of my life in which I declined free front-row tickets to Collective Soul to avoid rescheduling a first date with a girl with whom I knew I had no future. It’s safe to say that my judgment was not sound during those days.

Over the years, I became a bit wiser. I looked back and realized some of the mistakes that I’d made. In this particular case, I slowly came to understand that everything that convinced me to defragment was marketing material from a company that sold defragmentation software. I also forced myself to admit that I never could detect any post-defragmentation performance improvements. I had allowed the propaganda to sucker me into climbing onto a bandwagon carrying a lot of other suckers, and we reinforced each others’ delusions.

That said, we were mostly talking about single-drive systems in personal computers. That transitions right into the real problem with the fragmentation discussion.

Server Systems are not Desktop Systems

I was fortunate enough that my career did not immediately shift directly from desktop support into server support. I worked through a gradual transition period. I also enjoyed the convenience of working with top-tier server administrators. I learned quickly, and thoroughly, that desktop systems and server systems are radically different.

Usage Patterns

You rely on your desktop or laptop computer for multiple tasks. You operate e-mail, web browsing, word processing, spreadsheet, instant messaging, and music software on a daily basis. If you’re a gamer, you’ve got that as well. Most of these applications use small amounts of data frequently and haphazardly; some use large amounts of data, also frequently and haphazardly. The ratio of write operations to read operations is very high, with writes commonly outnumbering reads.

Servers are different. Well-architected servers in an organization with sufficient budget will run only one application or application suite. If they use much data, they’ll rely on a database. In almost all cases, server systems perform substantially more read operations than write operations.

The end result is that server systems almost universally have more predictable disk I/O demands and noticeably higher cache hits than desktop systems. Under equal fragmentation levels, they’ll fare better.

Storage Hardware

Whether or not you’d say that server-class systems contain “better” hardware than desktop system is a matter of perspective. Server systems usually provide minimal video capabilities and their CPUs have gigantic caches but are otherwise unremarkable. That only makes sense; playing the newest Resident Evil at highest settings with a smooth frame rate requires substantially more resources than a domain controller for 5,000 users. Despite what many lay people have come to believe, server systems typically don’t work very hard. We build them for reliability, not speed.

Where servers have an edge is storage. SCSI has a solid record as the premier choice for server-class systems. For many years, it was much more reliable, although the differences are negligible today. One advantage that SCSI drives maintain over their less expensive cousins is higher rotational speeds. Of all the improvements that I mentioned above, the most meaningful advance in IDE drives was the increase of rotational speed from 3,600 RPM to 7,200 RPM. That’s a 100% gain. SCSI drives ship with 10,000 RPM motors (~38% faster than 7,200 RPM) and 15,000 RPM motors (108% faster than 7,200 RPM!).

Spindle speed doesn’t address the reliability issue, though. Hard drives need many components, and a lot of them move. Mechanical failure due to defect or wear is a matter of “when”, not “if”. Furthermore, they are susceptible to things that other component designers don’t even think about. If you get very close to a hard drive and shout at it while it’s powered, you can cause data loss. Conversely, my solid-state phone doesn’t seem to suffer nearly as much as I do even after the tenth attempt to get “OKAY GOOGLE!!!” to work as advertised.

Due to the fragility of spinning disks, almost all server systems architects design them to use multiple drives in a redundant configuration (lovingly known as RAID). The side effect of using multiple disks like this is a speed boost. We’re not going to talk about different RAID types because that’s not important here. The real point is that in practically all cases, a RAID configuration is faster than a single disk configuration. The more unique spindles in an array, the higher its speed.

With SCSI and RAID, it’s trivial to achieve speeds that are many multipliers faster than a single disk system. If we assume that fragmentation has ill effects and that defragmentation has positive effects, they are mitigated by the inherent speed boosts of this topology.

These Differences are Meaningful

When I began classes to train desktop support staff to become server support staff, I managed to avoid asking any overly stupid questions. My classmates weren’t so lucky. One asked about defragmentation jobs on server systems. The echoes of laughter were still reverberating through the building when the instructor finally caught his breath enough to choke out, “We don’t defragment server systems.” The student was mortified into silence, of course. Fortunately, there were enough shared sheepish looks that the instructor felt compelled to explain it. That was in the late ’90s, so the explanation was a bit different then, but it still boiled down to differences in usage and technology.

With today’s technology, we should be even less fearful of fragmentation in the datacenter, but, my observations seem to indicate that the reverse has happened. My guess is that training isn’t what it used to be and we simply have too many server administrators that were promoted off of the retail floor or the end-user help desk a bit too quickly. This is important to understand, though. Edge cases aside, fragmentation is of no concern for a properly architected server-class system. If you are using disks of an appropriate speed in a RAID array of an appropriate size, you will never realize meaningful performance improvements from a defragmentation cycle. If you are experiencing issues that you believe are due to fragmentation, expanding your array by one member (or two for RAID-10) will return substantially greater yields than the most optimized disk layout.

Disk Fragmentation and Hyper-V

To conceptualize the effect of fragmentation on Hyper-V, just think about the effect of fragmentation in general. When you think of disk access on a fragmented volume, you’ve probably got something like this in mind:

Jumpy Access

Look about right? Maybe a bit more complicated than that, but something along those lines, yes?

Now, imagine a Hyper-V system. It’s got, say, three virtual machines with their VHDX files in the same location. They’re all in the fixed format and the whole volume is nicely defragmented and pristine. As the virtual machines are running, what does their disk access look like to you. Is it like this?:

Jumpy Access

If you’re surprised that the pictures are the same, then I don’t think that you understand virtualization. All VMs require I/O and they all require that I/O more or less concurrently with I/O needs of other VMs. In the first picture, access had to skip a few blocks because of fragmentation. In the second picture, access had to skip a few blocks because it was another VM’s turn. I/O will always be a jumbled mess in a shared-storage virtualization world. There are mitigation strategies, but defragmentation is the most useless.

For fragmentation to be a problem, it must interrupt what would have otherwise been a smooth read or write operation. In other words, fragmentation is most harmful on systems that commonly perform long sequential reads and/or writes. A typical Hyper-V system hosting server guests is unlikely to perform meaningful quantities of long sequential reads and/or writes.

Disk Fragmentation and Dynamically-Expanding VHDX

Fragmentation is the most egregious of the copious, terrible excuses that people give for not using dynamically-expanding VHDX. If you listen to them, they’ll paint a beautiful word picture that will have you daydreaming that all the bits of your VHDX files are scattered across your LUNs like a bag of Trail Mix. I just want to ask anyone who tells those stories: “Do you own a computer? Have you ever seen a computer? Do you know how computers store data on disks? What about Hyper-V, do you have any idea how that works?” I’m thinking that there’s something lacking on at least one of those two fronts.

The notion fronted by the scare message is that your virtual machines are just going to drop a few bits here and there until your storage looks like a finely sifted hodge-podge of multicolored powders. The truth is, that your virtual machines are going to allocate a great many blocks in one shot, maybe again at a later point in time, but will soon reach a sort of equilibrium. An example VM that uses a dynamically-expanding disk:

  • You create a new application server from an empty Windows Server template. Hyper-V writes that new VHDX copy as contiguously as the storage system can allow
  • You install the primary application. This causes Hyper-V to request many new blocks all at once. A large singular allocation results in the most contiguous usage possible
  • The primary application goes into production.
    • If it’s the sort of app that works with big gobs of data at a time, then Hyper-V writes big gobs, which are more or less contiguous.
    • If it’s the sort of app that works with little bits of data at a time, then fragmentation won’t matter much anyway
  • Normal activities cause a natural ebb and flow of the VM’s data usage (ex: downloading and deleting Windows Update files). A VM will re-use previously used blocks because that’s what computers do.

How to Address Fragmentation in Hyper-V

I am opposed to ever taking any serious steps to defragmenting a server system. It’s just a waste of time and causes a great deal of age-advancing disk thrashing. If you’re really concerned about disk performance, these are the best choices:

  • Add spindles to your storage array
  • Use faster disks
  • Use a faster array type
  • Don’t virtualize

If you have read all of this and done all of these things and you are still panicked about fragmentation, then there is still something that you can do. Get an empty LUN or other storage space that can hold your virtual machines. Use Storage Live Migration to move all of them there. Then, use Storage Live Migration to move them all back, one at a time. It will line them all up neatly end-to-end. If you want, copy in some “buffer” files in between each one and delete them once all VMs are in place. These directions come with a warning: you will never recover the time necessary to perform that operation.

Page 1 of 41234