Free Hyper-V Script: Virtual Machine Storage Consistency Diagnosis

Free Hyper-V Script: Virtual Machine Storage Consistency Diagnosis

Hyper-V allows you to scatter a virtual machine’s files just about anywhere. That might be good; it might be bad. That all depends on your perspective and need. No good can come from losing track of your files’ sprawl, though. Additionally, the placement and migration tools don’t always work exactly as expected. Even if the tools work perfectly, sometimes administrators don’t. To help you sort out these difficulties, I’ve created a small PowerShell script that will report on the consistency of a virtual machine’s file placement.

Free Hyper-V Script from Altaro

How the Script Works

I designed the script to be quick and simple to use and understand. Under the hood, it does these things:

  1. Gathers the location of the virtual machine’s configuration files. It considers that location to be the root.
  2. Gathers the location of the virtual machine’s checkpoint files. It compares that to the root. If the locations aren’t the same, it marks the VM’s storage as inconsistent.
  3. Gathers the location of the virtual machine’s second-level paging files. It compares that to the root. If the locations aren’t the same, it marks the VM’s storage as inconsistent.
  4. Gathers the location of the virtual machine’s individual virtual hard disk files. It compares each to the root. If the locations aren’t the same, it marks the VM’s storage as inconsistent.
  5. Gathers the location of any CD or DVD image files attached to the virtual machine. It compares each to the root. If the locations aren’t the same, it marks the VM’s storage as inconsistent.
  6. Emits a custom object that contains:
    1. The name of the virtual machine
    2. The name of the virtual machine’s Hyper-V host
    3. The virtual machine’s ID (a string-formatted GUID)
    4. A boolean ($true or $false) value that indicates if the virtual machine’s storage is consistent
    5. An array that contains a record of each location. Each entry in the array is itself a custom object with these two pieces:
      1. The component name(Configuration, Checkpoints, etc.)
      2. The location of the component

It doesn’t check for pass-through. A VM with pass-through disks would have inconsistent storage placement by definition, but this script completely ignores pass-through.

The script doesn’t go through a lot of validation to check if storage locations are truly unique. If you have a VM using “\\system\VMs” and “\\system\VMFiles” but they’re both the same physical folder, the VM will be marked as inconsistent.

Script Usage

I built the script to be named Get-VMStorageConsistency. All documentation and examples are built around that name. You must supply it with a virtual machine. All other parameters are optional.

Use Get-Help Get-VMStorageConsistency -Full to see the built-in help.

Parameters:

  • VM (aliases “VMName” and “Name”): The virtual machine(s) to check. The input type is an array, so it will accept one or more of any of the following:
    • A string that contains the virtual machine’s names
    • A VirtualMachine object (as output from Get-VM, etc.)
    • A GUID object that contains the virtual machine’s ID
    • A WMI Msvm_ComputerSystem object
    • A WMI MSCluster_Resource object
  • ComputerName: The name of the host that owns the virtual machine. If not specified, defaults to the local system. Ignored if VM is anything other than a String or GUID. Only accepts strings.
  • DisksOnly: A switch parameter (include if you want to use it, leave off otherwise). If specified, the script only checks at the physical storage level for consistency. Examples:
    • With this switch, C:\VMs and C:\VMCheckpoints are the same thing (simplifies to C:\)
    • With this switch, C:\ClusterStorage\VMs1\VMFiles and C:\ClusterStorage\VMs1\VHDXs are the same thing (simplifies to C:\ClusterStorage\VMs1)
    • With this switch, \\storage1\VMs\VMFiles and \\storage1\VMs\VHDFiles are the same thing (simplifies to \\storage1\VMs)
    • Without this switch, all of the above are treated as unique locations
  • IgnoreVHDFolder: A switch parameter (include if you want to use it, leave off otherwise). If specified, ignores the final “Virtual Hard Disks” path for virtual hard disks. Notes:
    • With this switch, VHDXs in “C:\VMs\Virtual Hard Disks” will be treated as though they were found in C:\VMs
    • With this switch, VHDXs in “C:\VMs\Virtual Hard Disks\VHDXs” will not be treated specially. This is because there is a folder underneath the one named “Virtual Hard Disks”.
    • With this switch, a VM configured to hold its checkpoints in a folder named “C:\VMs\Virtual Hard Disks\” will not treat its checkpoints especially. This is because the switch only applies to virtual hard disk files.
  • Verbose: A built-in switch parameter. If specified, the script will use the Verbose output stream to show you the exact comparison that caused a virtual machine to be marked as inconsistent. Note: The script only traps the first item that causes a virtual machine to be inconsistent. That’s because the Consistent marker is a boolean; in boolean logic, it’s not possible to become more false. Therefore, I considered it to be wasteful to continue processing. However, all locations are stored in the Locations property of the report. You can use that for an accurate assessment of all the VMs’ component locations.

The Output Object

The following shows the output of the cmdlet run on my host with the -IgnoreVHDFolder and -Verbose switches set:

conscript_output

Output object structure:

  • Name: String that contains the virtual machine’s name
  • ComputerName: String that contains the name of the Hyper-V host that currently owns the virtual machine
  • VMId: String representation of the virtual machine’s GUID
  • Consistent: Boolean value that indicates whether or not the virtual machine’s storage is consistently placed
  • Location: An array of custom objects that contain information about the location of each component

Location object structure:

  • Component: A string that identifies the component. I made these strings up. Possible values:
    • Configuration: Location of the “Virtual Machines” folder that contains the VM’s definition files (.xml, .vmcx, .bin, etc.)
    • Checkpoints: Location of the “Snapshots” folder configured for this virtual machine. Note: That folder might not physically exist if the VM uses a non-default location and has never been checkpointed.
    • SecondLevelPaging: Location of the folder where the VM will place its .slp files if it ever uses second-level paging.
    • Virtual Hard Disk: Full path of the virtual hard disk.
    • CD/DVD Image: Full path of the attached ISO.

An example that puts the object to use:

The above will output only the names of local virtual machines with inconsistent storage.

Script Source

The script is intended to be named “Get-VMStorageConsistency”. As shown, you would call its file (ex.: C:\Scripts\Get-VMStorageConsistency). If you want to use it dot-sourced or in your profile, uncomment lines 55, 56, and 262 (subject to change from editing; look for the commented-out function { } delimiters).

 

95 Best Practices for Optimizing Hyper-V Performance

95 Best Practices for Optimizing Hyper-V Performance

We can never get enough performance. Everything needs to be faster, faster, faster! You can find any number of articles about improving Hyper-V performance and best practices, of course, unfortunately, a lot of the information contains errors, FUD, and misconceptions. Some are just plain dated. Technology has changed and experience is continually teaching us new insights. From that, we can build a list of best practices that will help you to tune your system to provide maximum performance.

How to optimize Hyper-V Performance

Philosophies Used in this Article

This article focuses primarily on performance. It may deviate from other advice that I’ve given in other contexts. A system designed with performance in mind will be built differently from a system with different goals. For instance, a system that tries to provide high capacity at a low price point would have a slower performance profile than some alternatives.

  • Subject matter scoped to the 2012 R2 and 2016 product versions.
  • I want to stay on target by listing the best practices with fairly minimal exposition. I’ll expand ideas where I feel the need; you can always ask questions in the comments section.
  • I am not trying to duplicate pure physical performance in a virtualized environment. That’s a wasted effort.
  • I have already written an article on best practices for balanced systems. It’s a bit older, but I don’t see anything in it that requires immediate attention. It was written for the administrator who wants reasonable performance but also wants to stay under budget.
  • This content targets datacenter builds. Client Hyper-V will follow the same general concepts with variable applicability.

General Host Architecture

If you’re lucky enough to be starting in the research phase — meaning, you don’t already have an environment — then you have the most opportunity to build things properly. Making good purchase decisions pays more dividends than patching up something that you’ve already got.

  1. Do not go in blind.
    • Microsoft Assessment and Planning Toolkit will help you size your environment: MAP Toolkit
    • Ask your software vendors for their guidelines for virtualization on Hyper-V.
    • Ask people that use the same product(s) if they have virtualized on Hyper-V.
  2. Stick with logo-compliant hardware. Check the official list: https://www.windowsservercatalog.com/
  3. Most people will run out of memory first, disk second, CPU third, and network last. Purchase accordingly.
  4. Prefer newer CPUs, but think hard before going with bleeding edge. You may need to improve performance by scaling out. Live Migration requires physical CPUs to be the same or you’ll need to enable CPU compatibility mode. If your environment starts with recent CPUs, then you’ll have the longest amount of time to be able to extend it. However, CPUs commonly undergo at least one revision, and that might be enough to require compatibility mode. Attaining maximum performance may reduce virtual machine mobility.
  5. Set a target density level, e.g. “25 virtual machines per host”. While it may be obvious that higher densities result in lower performance, finding the cut-off line for “acceptable” will be difficult. However, having a target VM number in mind before you start can make the challenge less nebulous.
  6. Read the rest of this article before you do anything.

Management Operating System

Before we carry on, I just wanted to make sure to mention that Hyper-V is a type 1 hypervisor, meaning that it runs right on the hardware. You can’t “touch” Hyper-V because it has no direct interface. Instead, you install a management operating system and use that to work with Hyper-V. You have three choices:

Note: Nano Server initially offered Hyper-V, but that functionality will be removed (or has already been removed, depending on when you read this). Most people ignore the fine print of using Nano Server, so I never recommended it anyway.

TL;DR: In absence of a blocking condition, choose Hyper-V Server. A solid blocking condition would be the Automatic Virtual Machine Activation feature of Datacenter Edition. In such cases, the next preferable choice is Windows Server in Core mode.

I organized those in order by distribution size. Volumes have been written about the “attack surface” and patching. Most of that material makes me roll my eyes. No matter what you think of all that, none of it has any meaningful impact on performance. For performance, concern yourself with the differences in CPU and memory footprint. The widest CPU/memory gap lies between Windows Server and Windows Server Core. When logged off, the Windows Server GUI does not consume many resources, but it does consume some. The space between Windows Server Core and Hyper-V Server is much tighter, especially when the same features/roles are enabled.

One difference between Core and Hyper-V Server is the licensing mechanism. On Datacenter Edition, that does include the benefit of Automatic Virtual Machine Activation (AVMA). That only applies to the technological wiring. Do not confuse it with the oft-repeated myth that installing Windows Server grants guest licensing privileges. The legal portion of licensing stands apart; read our eBook for starting information.

Because you do not need to pay for the license for Hyper-V Server, it grants one capability that Windows Server does not: you can upgrade at any time. That allows you to completely decouple the life cycle of your hosts from your guests. Such detachment is a hallmark of the modern cloud era.

If you will be running only open source operating systems, Hyper-V Server is the natural choice. You don’t need to pay any licensing fees to Microsoft at all with that usage. I don’t realistically expect any pure Linux shops to introduce a Microsoft environment, but Linux-on-Hyper-V is a fantastic solution in a mixed-platform environment. And with that, let’s get back onto the list.

Management Operating System Best Practices for Performance

  1. Prefer Hyper-V Server first, Windows Server Core second
  2. Do not install any software, feature, or role in the management operating system that does not directly aid the virtual machines or the management operating system. Hyper-V prioritizes applications in the management operating system over virtual machines. That’s because it trusts you; if you are running something in the management OS, it assumes that you really need it.
  3. Do not log on to the management operating system. Install the management tools on your workstation and manipulate Hyper-V remotely.
  4. If you must log on to the management operating system, log off as soon as you’re done.
  5. Do not browse the Internet from the management operating system. Don’t browse from any server, really.
  6. Stay current on mainstream patches.
  7. Stay reasonably current on driver versions. I know that many of my peers like to install drivers almost immediately upon release, but I can’t join that camp. While it’s not entirely unheard of for a driver update to bring performance improvements, it’s not common. With all of the acquisitions and corporate consolidations going on in the hardware space — especially networking — I feel that the competitive drive to produce quality hardware and drivers has entered a period of decline. In simple terms, view new drivers as a potential risk to stability, performance, and security.
  8. Join your hosts to the domain. Systems consume less of your time if they answer to a central authority.
  9. Use antivirus and intrusion prevention. As long you choose your anti-malware vendor well and the proper exclusions are in place, performance will not be negatively impacted. Compare that to the performance of a compromised system.
  10. Read through our article on host performance tuning.

Leverage Containers

In the “traditional” virtualization model, we stand up multiple virtual machines running individual operating system environments. As “virtual machine sprawl” sets in, we wind up with a great deal of duplication. In the past, we could justify that as a separation of the environment. Furthermore, some Windows Server patches caused problems for some software but not others. In the modern era, containers and omnibus patch packages have upset that equation.

Instead of building virtual machine after virtual machine, you can build a few virtual machines. Deploy containers within them. Strategies for this approach exceed the parameters of this article, but you’re aiming to reduce the number of disparate complete operating system environments deployed. With careful planning, you can reduce density while maintaining a high degree of separation for your services. Fewer kernels are loaded, fewer context switches occur, less memory contains the same code bits, fewer disk seeks to retrieve essentially the same information from different locations.

  1. Prefer containers over virtual machines where possible.

CPU

You can’t do a great deal to tune CPU performance in Hyper-V. Overall, I count that among my list of “good things”; Microsoft did the hard work for you.

  1. Follow our article on host tuning; pay special attention to C States and the performance power settings.
  2. For Intel chips, leave hyperthreading on unless you have a defined reason to turn it off.
  3. Leave NUMA enabled in hardware. On your VMs’ property sheet, you’ll find a Use Hardware Topology button. Remember to use that any time that you adjust the number of vCPUs assigned to a virtual machine or move it to a host that has a different memory layout (physical core count and/or different memory distribution).
    best pratices for optimizing hyper-v performance - settings NUMA configuration
  4. Decide whether or not to allow guests to span NUMA nodes (the global host NUMA Spanning setting). If you size your VMs to stay within a NUMA node and you are careful to not assign more guests than can fit solidly within each NUMA node, then you can increase individual VM performance. However, if the host has trouble locking VMs into nodes, then you can negatively impact overall memory performance. If you’re not sure, just leave NUMA at defaults and tinker later.
  5. For modern guests, I recommend that you use at least two virtual CPUs per virtual machine. Use more in accordance with the virtual machine’s performance profile or vendor specifications. This is my own personal recommendation; I can visibly detect the response difference between a single vCPU guest and a dual vCPU guest.
  6. For legacy Windows guests (Windows XP/Windows Server 2003 and earlier), use 1 vCPU. More will likely hurt performance more than help.
  7. Do not grant more than 2 vCPU to a virtual machine without just cause. Hyper-V will do a better job reducing context switches and managing memory access if it doesn’t need to try to do too much core juggling. I’d make exceptions for very low-density hosts where 2 vCPU per guest might leave unused cores. At the other side, if you’re assigning 24 cores to every VM just because you can, then you will hurt performance.
  8. If you are preventing VMs from spanning NUMA nodes, do not assign more vCPU to a VM than you have matching physical cores in a NUMA node (usually means the number of cores per physical socket, but check with your hardware manufacturer).
  9. Use Hyper-V’s priority, weight, and reservation settings with great care. CPU bottlenecks are highly uncommon; look elsewhere first. A poor reservation will cause more problems than it solves.

Memory

I’ve long believed that every person that wants to be a systems administrator should be forced to become conversant in x86 assembly language, or at least C. I can usually spot people that have no familiarity with programming in such low-level languages because they almost invariably carry a bizarre mental picture of how computer memory works. Fortunately, modern memory is very, very, very fast. Even better, the programmers of modern operating system memory managers have gotten very good at their craft. Trying to tune memory as a systems administrator rarely pays dividends. However, we can establish some best practices for memory in Hyper-V.

  1. Follow our article on host tuning. Most importantly, if you have multiple CPUs, install your memory such that it uses multi-channel and provides an even amount of memory to each NUMA node.
  2. Be mindful of operating system driver quality. Windows drivers differ from applications in that they can permanently remove memory from the available pool. If they do not properly manage that memory, then you’re headed for some serious problems.
  3. Do not make your CSV cache too large.
  4. For virtual machines that will perform high quantities of memory operations, avoid dynamic memory. Dynamic memory disables NUMA (out of necessity). How do you know what constitutes a “high volume”? Without performance monitoring, you don’t.
  5. Set your fixed memory VMs to a higher priority and a shorter startup delay than your Dynamic Memory VMs. This ensures that they will start first, allowing Hyper-V to plot an optimal NUMA layout and reduce memory fragmentation. It doesn’t help a lot in a cluster, unfortunately. However, even in the best case, this technique won’t yield many benefits.
  6. Do not use more memory for a virtual machine than you can prove that it needs. Especially try to avoid using more memory than will fit in a single NUMA node.
  7. Use Dynamic Memory for virtual machines that do not require the absolute fastest memory performance.
  8. For Dynamic Memory virtual machines, pay the most attention to the startup value. It sets the tone for how the virtual machine will be treated during runtime. For virtual machines running full GUI Windows Server, I tend to use a startup of either 1 GB or 2 GB, depending on the version and what else is installed.
  9. For Dynamic Memory VMs, set the minimum to the operating system vendor’s stated minimum (512 MB for Windows Server). If the VM hosts a critical application, add to the minimum to ensure that it doesn’t get choked out.
  10. For Dynamic Memory VMs, set the maximum to a reasonable amount. You’ll generally discover that amount through trial and error and performance monitoring. Do not set it to an arbitrarily high number. Remember that, even on 2012 R2, you can raise the maximum at any time.

Check the CPU section for NUMA guidance.

Networking

In the time that I’ve been helping people with Hyper-V, I don’t believe that I’ve seen anyone waste more time worrying about anything that’s less of an issue than networking. People will read whitepapers and forums and blog articles and novels and work all weekend to draw up intricately designed networking layouts that need eight pages of documentation. But, they won’t spend fifteen minutes setting up a network utilization monitor. I occasionally catch grief for using MRTG since it’s old and there are shinier, bigger, bolder tools, but MRTG is easy and quick to set up. You should know how much traffic your network pushes. That knowledge can guide you better than any abstract knowledge or feature list.

That said, we do have many best practices for networking performance in Hyper-V.

  1. Follow our article on host tuning. Especially pay attention to VMQ on gigabit and separation of storage traffic.
  2. If you need your network to go faster, use faster adapters and switches. A big team of gigabit won’t keep up with a single 10 gigabit port.
  3. Use a single virtual switch per host. Multiple virtual switches add processing overhead. Usually, you can get a single switch to do whatever you wanted multiple switches to do.
  4. Prefer a single large team over multiple small teams. This practice can also help you to avoid needless virtual switches.
  5. For gigabit, anything over 4 physical ports probably won’t yield meaningful returns. I would use 6 at the outside. If you’re using iSCSI or SMB, then two more physical adapters just for that would be acceptable.
  6. For 10GbE, anything over 2 physical ports probably won’t yield meaningful returns.
  7. If you have 2 10GbE and a bunch of gigabit ports in the same host, just ignore the gigabit. Maybe use it for iSCSI or SMB, if it’s adequate for your storage platform.
  8. Make certain that you understand how the Hyper-V virtual switch functions. Most important:
    • You cannot “see” the virtual switch in the management OS except with Hyper-V specific tools. It has no IP address and no presence in the Network and Sharing Center applet.
    • Anything that appears in Network and Sharing Center that you think belongs to the virtual switch is actually a virtual network adapter.
    • Layer 3 (IP) information in the host has no bearing on guests — unless you create an IP collision
  9. Do not create a virtual network adapter in the management operating system for the virtual machines. I did that before I understood the Hyper-V virtual switch, and I have encountered lots of other people that have done it. The virtual machines will use the virtual switch directly.
  10. Do not multi-home the host unless you know exactly what you are doing. Valid reasons to multi-home:
    • iSCSI/SMB adapters
    • Separate adapters for cluster roles. e.g. “Management”, “Live Migration”, and “Cluster Communications”
  11. If you multi-home the host, give only one adapter a default gateway. If other adapters must use gateways, use the old route command or the new New-NetRoute command.
  12. Do not try to use internal or private virtual switches for performance. The external virtual switch is equally fast. Internal and private switches are for isolation only.
  13. If all of your hardware supports it, enable jumbo frames. Ensure that you perform validation testing (i.e.: ping storage-ip -f -l 8000)
  14. Pay attention to IP addressing. If traffic needs to locate an external router to reach another virtual adapter on the same host, then traffic will traverse the physical network.
  15. Use networking QoS if you have identified a problem.
    • Use datacenter bridging, if your hardware supports it.
    • Prefer the Weight QoS mode for the Hyper-V switch, especially when teaming.
    • To minimize the negative side effects of QoS, rely on limiting the maximums of misbehaving or non-critical VMs over trying to guarantee minimums for vital VMs.
  16. If you have SR-IOV-capable physical NICs, it provides the best performance. However, you can’t use the traditional Windows team for the physical NICs. Also, you can’t use VMQ and SR-IOV at the same time.
  17. Switch-embedded teaming (2016) allows you to use SR-IOV. Standard teaming does not.
  18. If using VMQ, configure the processor sets correctly.
  19. When teaming, prefer Switch Independent mode with the Dynamic load balancing algorithm. I have done some performance testing on the types (near the end of the linked article). However, a reader commented on another article that the Dynamic/Switch Independent combination can cause some problems for third-party load balancers (see comments section).

Storage

When you need to make real differences in Hyper-V’s performance, focus on storage. Storage is slow. The best way to make storage not be slow is to spend money. But, we have other ways.

  1. Follow our article on host tuning. Especially pay attention to:
    • Do not break up internal drive bays between Hyper-V and the guests. Use one big array.
    • Do not tune the Hyper-V partition for speed. After it boots, Hyper-V averages zero IOPS for itself. As a prime example, don’t put Hyper-V on SSD and the VMs on spinning disks. Do the opposite.
    • The best ways to get more storage speed is to use faster disks and bigger arrays. Almost everything else will only yield tiny differences.
  2. For VHD (not VHDX), use fixed disks for maximum performance. Dynamically-expanding VHD is marginally, but measurably, slower.
  3. For VHDX, use dynamically-expanding disks for everything except high-utilization databases. I receive many arguments on this, but I’ve done the performance tests and have years of real-world experience. You can trust that (and run the tests yourself), or you can trust theoretical whitepapers from people that make their living by overselling disk space but have perpetually misplaced their copy of diskspd.
  4. Avoid using shared VHDX (2012 R2) or VHDS (2016). Performance still isn’t there. Give this technology another maturation cycle or two and look at it again.
  5. Where possible, do not use multiple data partitions in a single VHD/X.
  6. When using Cluster Shared Volumes, try to use at least as many CSVs as you have nodes. Starting with 2012 R2, CSV ownership will be distributed evenly, theoretically improving overall access.
  7. You can theoretically improve storage performance by dividing virtual machines across separate storage locations. If you need to make your arrays span fewer disks in order to divide your VMs’ storage, you will have a net loss in performance. If you are creating multiple LUNs or partitions across the same disks to divide up VMs, you will have a net loss in performance.
  8. For RDS virtual machine-based VDI, use hardware-based or Windows’ Hyper-V-mode deduplication on the storage system. The read hits, especially with caching, yield positive performance benefits.
  9. The jury is still out on using host-level deduplication for Windows Server guests, but it is supported with 2016. I personally will be trying to place Server OS disks on SMB storage deduplicated in Hyper-V mode.
  10. The slowest component in a storage system is the disk(s); don’t spend a lot of time worrying about controllers beyond enabling caching.
  11. RAID-0 is the fastest RAID type, but provides no redundancy.
  12. RAID-10 is generally the fastest RAID type that provides redundancy.
  13. For Storage Spaces, three-way mirror is fastest (by a lot).
  14. For remote storage, prefer MPIO or SMB multichannel over multiple unteamed adapters. Avoid placing this traffic on teamed adapters.
  15. I’ve read some scattered notes that say that you should format with 64 kilobyte allocation units. I have never done this, mostly because I don’t think about it until it’s too late. If the default size hurts anything, I can’t tell. Someday, I’ll remember to try it and will update this article after I’ve gotten some performance traces. If you’ll be hosting a lot of SQL VMs and will be formatting their VHDX with 64kb AUs, then you might get more benefit.
  16. I still don’t think that ReFS is quite mature enough to replace NTFS for Hyper-V. For performance, I definitely stick with NTFS.
  17. Don’t do full defragmentation. It doesn’t help. The minimal defragmentation that Windows automatically performs is all that you need. If you have some crummy application that makes this statement false, then stop using that application or exile it to its own physical server. Defragmentation’s primary purpose is to wear down your hard drives so that you have to buy more hard drives sooner than necessary, which is why employees of hardware vendors recommend it all the time. If you have a personal neurosis that causes you pain when a disk becomes “too” fragmented, use Storage Live Migration to clear and then re-populate partitions/LUNs. It’s wasted time that you’ll never get back, but at least it’s faster. Note: All retorts must include verifiable and reproducible performance traces, or I’m just going to delete them.

Clustering

For real performance, don’t cluster virtual machines. Use fast internal or direct-attached SSDs. Cluster for redundancy, not performance. Use application-level redundancy techniques instead of relying on Hyper-V clustering.

In the modern cloud era, though, most software doesn’t have its own redundancy and host clustering is nearly a requirement. Follow these best practices:

  1. Validate your cluster. You may not need to fix every single warning, but be aware of them.
  2. Follow our article on host tuning. Especially pay attention to the bits on caching storage. It includes a link to enable CSV caching.
  3. Remember your initial density target. Add as many nodes as necessary to maintain that along with sufficient extra nodes for failure protection.
  4. Use the same hardware in each node. You can mix hardware, but CPU compatibility mode and mismatched NUMA nodes will have at least some impact on performance.
  5. For Hyper-V, every cluster node should use a minimum of two separate IP endpoints. Each IP must exist in a separate subnet. This practice allows the cluster to establish multiple simultaneous network streams for internode traffic.
    • One of the addresses must be designated as a “management” IP, meaning that it must have a valid default gateway and register in DNS. Inbound connections (such as your own RDP and PowerShell Remoting) will use that IP.
    • None of the non-management IPs should have a default gateway or register in DNS.
    • One alternative IP endpoint should be preferred for Live Migration. Cascade Live Migration preference order through the others, ending with the management IP. You can configure this setting most easily in Failover Cluster Manager by right-clicking on the Networks node.
    • Further IP endpoints can be used to provide additional pathways for cluster communications. Cluster communications include the heartbeat, cluster status and update messages, and Cluster Shared Volume information and Redirected Access traffic.
    • You can set any adapter to be excluded from cluster communications but included in Live Migration in order to enforce segregation. Doing so generally does not improve performance, but may be desirable in some cases.
    • You can use physical or virtual network adapters to host cluster IPs.
    • The IP for each cluster adapter must exist in a unique subnet on that host.
    • Each cluster node must contain an IP address in the same subnet as the IPs on other nodes. If a node does not contain an IP in a subnet that exists on other nodes, then that network will be considered “partitioned” and the node(s) without a member IP will be excluded from that network.
    • If the host will connect to storage via iSCSI, segregate iSCSI traffic onto its own IP(s). Exclude it/them from cluster communications and Live Migration. Because they don’t participate in cluster communications, it is not absolutely necessary that they be placed into separate subnets. However, doing so will provide some protection from network storms.
  6. If you do not have RDMA-capable physical adapters, Compression usually provides the best Live Migration performance.
  7. If you do have RDMA-capable physical adapters, SMB usually provides the best Live Migration performance.
  8. I don’t recommend spending time tinkering with the metric to shape CSV traffic anymore. It utilizes SMB, so the built-in SMB multi-channel technology can sort things out.

Virtual Machines

The preceding guidance obliquely covers several virtual machine configuration points (check the CPU and the memory sections). We have a few more:

  1. Don’t use Shielded VMs or BitLocker. The encryption and VMWP hardening incur overhead that will hurt performance. The hit is minimal — but this article is about performance.
  2. If you have 1) VMs with very high inbound networking needs, 2) physical NICs >= 10GbE, 3) VMQ enabled, 4) spare CPU cycles, then enable RSS within the guest operating systems. Do not enable RSS in the guest OS unless all of the preceding are true.
  3. Do not use the legacy network adapter in Generation 1 VMs any more than absolutely necessary.
  4. Utilize checkpoints rarely and briefly. Know the difference between standard and “production” checkpoints.
  5. Use time synchronization appropriately. Meaning, virtual domain controllers should not have the Hyper-V time synchronization service enabled, but all other VMs should (generally speaking). The hosts should pull their time from the domain hierarchy. If possible, the primary domain controller should be pulling from a secured time source.
  6. Keep Hyper-V guest services up-to-date. Supported Linux systems can be updated via kernel upgrades/updates from their distribution repositories. Windows 8.1+ and Windows Server 2012 R2+ will update from Windows Update.
  7. Don’t do full defragmentation in the guests, either. Seriously. We’re administering multi-spindle server equipment here, not displaying a progress bar to someone with a 5400-RPM laptop drive so that they feel like they’re accomplishing something.
  8. If the virtual machine’s primary purpose is to run an application that has its own replication technology, don’t use Hyper-V Replica. Examples: Active Directory and Microsoft SQL Server. Such applications will replicate themselves far more efficiently than Hyper-V Replica.
  9. If you’re using Hyper-V Replica, consider moving the VMs’ page files to their own virtual disk and excluding it from the replica job. If you have a small page file that doesn’t churn much, that might cost you more time and effort than you’ll recoup.
  10. If you’re using Hyper-V Replica, enable compression if you have spare CPU but leave it disabled if you have spare network. If you’re not sure, use compression.
  11. If you are shipping your Hyper-V Replica traffic across an encrypted VPN or keeping its traffic within secure networks, use Kerberos. SSL en/decryption requires CPU. Also, the asymmetrical nature of SSL encryption causes the encrypted data to be much larger than its decrypted source.

Monitoring

You must monitor your systems. Monitoring is not and has never been, an optional activity.

  1. Be aware of Hyper-V-specific counters. Many people try to use Task Manager in the management operating system to gauge guest CPU usage, but it just doesn’t work. The management operating system is a special-case virtual machine, which means that it is using virtual CPUs. Its Task Manager cannot see what the guests are doing.
  2. Performance Monitor has the most power of any built-in tool, but it’s tough to use. Look at something like Performance Analysis of Logs (PAL) tool, which understands Hyper-V.
  3. In addition to performance monitoring, employ state monitoring. With that, you no longer have to worry (as much) about surprise events like disk space or memory filling up. I like Nagios, as regular readers already know, but you can select from many packages.
  4. Take periodic performance baselines and compare them to earlier baselines

 

If you’re able to address a fair proportion of points from this list, I’m sure you’ll see a boost in Hyper-V performance. Don’t forget this list is not exhaustive and I’ll be adding to it periodically to ensure it’s as comprehensive as possible however if you think there’s something missing, let me know in the comments below and you may see the number 95 increase!

Get Involved on twitter: #How2HyperV

Get involved on twitter where we will be regularly posting excerpts from this article and engaging the IT community to help each other improve our use of Hyper-V. Got your own Hyper-V tips or tricks for boosting performance? Use the hashtag #How2HyperV when you tweet and share your knowledge with the world!

#How2HyperV Tweets

How to Perform Hyper-V Storage Migration

How to Perform Hyper-V Storage Migration

New servers? New SAN? Trying out hyper-convergence? Upgrading to Hyper-V 2016? Any number of conditions might prompt you to move your Hyper-V virtual machine’s storage to another location. Let’s look at the technologies that enable such moves.

An Overview of Hyper-V Migration Options

Hyper-V offers numerous migration options. Each has its own distinctive features. Unfortunately, we in the community often muck things up by using incorrect and confusing terminology. So, let’s briefly walk through the migration types that Hyper-V offers:

  • Quick migration: Cluster-based virtual machine migration that involves placing a virtual machine into a saved state, transferring ownership to another node in the same cluster, and resuming the virtual machine. A quick migration does not involve moving anything that most of us consider storage.
  • Live migration: Cluster-based virtual machine migration that involves transferring the active state of a running virtual machine to another node in the same cluster. A Live Migration does not involve moving anything that most of us consider storage.
  • Storage migration: Any technique that utilizes the Hyper-V management service to relocate any file-based component that belongs to a virtual machine. This article focuses on this migration type, so I won’t expand any of those thoughts in this list.
  • Shared Nothing Live Migration: Hyper-V migration technique between two hosts that does not involve clustering. It may or may not include a storage migration. The virtual machine might or might not be running. However, this migration type always includes ownership transfer from one host to another.

It Isn’t Called Storage Live Migration

I have always called this operation “Storage Live Migration”. I know lots of other authors call it “Storage Live Migration”. But, Microsoft does not call it “Storage Live Migration”. They just call it “Storage Migration”. The closest thing that I can find to “Storage Live Migration” in anything from Microsoft is a 2012 TechEd recording by Benjamin Armstrong. The title of that presentation includes the phrase “Live Storage Migration”, but I can’t determine if the “Live” just modifies “Storage Migration” or if Ben uses it as part of the technology name. I suppose I could listen to the entire hour and a half presentation, but I’m lazy. I’m sure that it’s a great presentation, if anyone wants to listen and report back.

Anyway, does it matter? I don’t really think so. I’m certainly not going to correct anyone that uses that phrase. However, the virtual machine does not necessarily need to be live. We use the same tools and commands to move a virtual machine’s storage whether it’s online or offline. So, “Storage Migration” will always be a correct term. “Storage Live Migration”, not so much. However, we use the term “Shared Nothing Live Migration” for virtual machines that are turned off, so we can’t claim any consistency.

What Can Be Moved with Hyper-V Storage Migration?

When we talk about virtual machine storage, most people think of the places where the guest operating system stores its data. That certainly comprises the physical bulk of virtual machine storage. However, it’s also only one bullet point on a list of multiple components that form a virtual machine.

Independently, you can move any of these virtual machine items:

  • The virtual machine’s core files (configuration in xml or .vmcx, .bin, .vsv, etc.)
  • The virtual machine’s checkpoints (essentially the same items as the preceding bullet point, but for the checkpoint(s) instead of the active virtual machine)
  • The virtual machine’s second-level paging file location. I have not tested to see if it will move a VM with active second-level paging files, but I have no reason to believe that it wouldn’t
  • Virtual hard disks attached to a virtual machine
  • ISO images attached to a virtual machine

We most commonly move all of these things together. Hyper-V doesn’t require that, though. Also, we can move all of these things in the same operation but distribute them to different destinations.

What Can’t Be Moved with Hyper-V Storage Migration?

In terms of storage, we can move everything related to a virtual machine. But, we can’t move the VM’s active, running state with Storage Migration. Storage Migration is commonly partnered with a Live Migration in the operation that we call “Shared Nothing Live Migration”. To avoid getting bogged down in implementation details that are more academic than practical, just understand one thing: when you pick the option to move the virtual machine’s storage, you are not changing which Hyper-V host owns and runs the virtual machine.

More importantly, you can’t use any Microsoft tool-based technique to separate a differencing disk from its parent. So, if you have an AVHDX (differencing disk created by the checkpointing mechanism) and you want to move it away from its source VHDX, Storage Migration will not do it. If you instruct Storage Migration to move the AVHDX, the entire disk chain goes along for the ride.

Uses for Hyper-V Storage Migration

Out of all the migration types, storage migration has the most applications and special conditions. For instance, Storage Migration is the only Hyper-V migration type that does not always require domain membership. Granted, the one exception to the domain membership rule won’t be very satisfying for people that insist on leaving their Hyper-V hosts in insecure workgroup mode, but I’m not here to please those people. I’m here to talk about the nuances of Storage Migration.

Local Relocation

Let’s start with the simplest usage: relocation of local VM storage. Some situations in this category:

  • You left VMs in the default “C:\ProgramData\Microsoft\Windows\Hyper-V” and/or “C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks” locations and you don’t like it
  • You added new internal storage as a separate volume and want to re-distribute your VMs
  • You have storage speed tiers but no active management layer
  • You don’t like the way your VMs’ files are laid out
  • You want to defragment VM storage space. It’s a waste of time, but it works.

Network Relocation

With so many ways to do network storage, it’s nearly a given that we’ll all need to move a VHDX across ours at some point. Some situations:

  • You’re migrating from local storage to network storage
  • You’re replacing a SAN or NAS and need to relocate your VMs
  • You’ve expanded your network storage and want to redistribute your VMs

Most of the reasons listed under “Local Relocation” can also apply to network relocation.

Cluster Relocation

We can’t always build our clusters perfectly from the beginning. For the most part, a cluster’s relocation needs list will look like the local and network lists above. A few others:

  • Your cluster has new Cluster Shared Volumes that you want to expand into
  • Existing Cluster Shared Volumes do not have a data distribution that does not balance well. Remember that data access from a CSV owner node is slightly faster than from a non-owner node

The reasons matter less than the tools when you’re talking about clusters. You can’t use the same tools and techniques to move virtual machines that are protected by Failover Clustering under Hyper-V as you use for non-clustered VMs.

Turning the VM Off Makes a Difference for Storage Migration

You can perform a very simple experiment: perform a Storage Migration for a virtual machine while it’s on, then turn it off and migrate it back. The virtual machine will move much more quickly while it’s off. This behavior can be explained in one word: synchronization.

When the virtual machine is off, a Storage Migration is essentially a monitored file copy. The ability of the constituent parts to move bits from source to destination sets the pace of the move. When the virtual machine is on, all of the rules change. The migration is subjected to these constraints:

  • The virtual machine’s operating system must remain responsive
  • Writes must be properly captured
  • Reads must occur from the most appropriate source

Even if the guest operating does not experience much activity during the move, that condition cannot be taken as a constant. In other words, Hyper-V needs to be ready for it to start demanding lots of I/O at any time.

So, the Storage Migration of a running virtual machine will always take longer than the Storage Migration of a virtual machine in an off or saved state. You can choose the convenience of an online migration or the speed of an offline migration.

Note: You can usually change a virtual machine’s power state during a Storage Migration. It’s less likely to work if you are moving across hosts.

How to Perform Hyper-V Storage Migration with PowerShell

The nice thing about using PowerShell for Storage Migration: it works for all Storage Migration types. The bad thing about using PowerShell for Storage Migration: it can be difficult to get all of the pieces right.

The primary cmdlet to use is Move-VMStorage. If you will be performing a Shared Nothing Live Migration, you can also use Move-VM. The parts of Move-VM that pertain to storage match Move-VMStorage. Move-VM has uses, requirements, and limitations that don’t pertain to the topic of this article, so I won’t cover Move-VM here.

A Basic Storage Migration in PowerShell

Let’s start with an easy one. Use this when you just want all of a VM’s files to be in one place:

This will move the virtual machine named testvm so that all of its components reside under the C:\LocalVMs folder. That means:

  • The configuration files will be placed in C:\LocalVMs\Virtual Machines
  • The checkpoint files will be placed in C:\LocalVMs\Snapshots
  • The VHDXs will be placed in C:\LocalVMs\Virtual Hard Disks
  • Depending on your version, an UndoLog Configuration folder will be created if it doesn’t already exist. The folder is meant to contain Hyper-V Replica files. It may be created even for virtual machines that aren’t being replicated.

Complex Storage Migrations in PowerShell

For more complicated move scenarios, you won’t use the DestinationStoragePath parameter. You’ll use one or more of the individual component parameters. Choose from the following:

  • VirtualMachinePath: Where to place the VM’s configuration files.
  • SnapshotFilePath: Where to place the VM’s checkpoint files (again, NOT the AVHDXs!)
  • SmartPagingFilePath: Where to place the VM’s smart paging files
  • Vhds: An array of hash tables that indicate where to place individual VHD/X files.

Some notes on these items:

  • You are not required to use all of these parameters. If you do not specify a parameter, then its related component is left alone. Meaning, it doesn’t get moved at all.
  • If you’re trying to use this to get away from those auto-created Virtual Machines and Snapshots folders, it doesn’t work. They’ll always be created as sub-folders of whatever you type in.
  • It doesn’t auto-create a Virtual Hard Disks folder.
  • If you were curious whether or not you needed to specify those auto-created subfolders, the answer is: no. Move-VMStorage will always create them for you (unless they already exist).
  • The VHDs hash table is the hardest part of this whole thing. I’m usually a PowerShell-first kind of guy, but even I tend to go to the GUI for Storage Migrations.

The following will move all components except VHDs, which I’ll tackle in the next section:

Move-VMStorage’s Array of Hash Tables for VHDs

The three …FilePath parameters are easy: just specify the path. The Vhds parameter is tougher. It is one or more hash tables inside an array.

First, the hash tables. A hash table is a custom object that looks like an array, but each entry has a unique name. The hash tables that Vhds expects have a SourceFilePath entry and a DestinationFilePath entry. Each must be fully-qualified for a file. A hash table is contained like this: @{ }. The name of an entry and its value are joined with an =. Entries are separated by a ; So, if you want to move the VHDX named svtest.vhdx from \\svstore\VMs to C:\LocalVMs\testvm, you’d use this hash table:

Reading that, you might ask (quite logically): “Can I change the name of the VHDX file when I move it?” The answer: No, you cannot. So, why then do you need to enter the full name of the destination file? I don’t know!

Next, the arrays. An array is bounded by @( ). Its entries are separated by commas. So, to move two VHDXs, you would do something like this:

I broke that onto multiple lines for legibility. You can enter it all on one line. Note where I used parenthesis and where I used curly braces.

Tip: To move a single VHDX file, you don’t need to do the entire array notation. You can use the first example with Vhds.

A Practical Move-VMStorage Example with Vhds

If you’re looking at all that and wondering why you’d ever use PowerShell for such a thing, I have the perfect answer: scripting. Don’t do this by hand. Use it to move lots of VMs in one fell swoop. If you want to see a plain example of the Vhds parameter in action, the Get-Help examples show one. I’ve got a more practical script in mind.

The following would move all VMs on the host. All of their config, checkpoint, and second-level paging files will be placed on a share named “\\vmstore\slowstorage”. All of their VHDXs will be placed on a share named “\\vmstore\faststorage”. We will have PowerShell deal with the source paths and file names.

I used splatting for the parameters for two reasons: 1, legibility. 2, to handle VMs without any virtual hard disks.

How to Perform Hyper-V Storage Migration with Hyper-V Manager

Hyper-V Manager can only be used for non-clustered virtual machines. It utilizes a wizard format. To use it to move a virtual machine’s storage:

  1. Right-click on the virtual machine and click Move.
  2. Click Next on the introductory page.
  3. Change the selection to Move the virtual machine’s storage (the same storage options would be available if you moved the VM’s ownership, but that’s not part of this article)
    movevm_hvmwiz1
  4. Choose how to perform the move. You can move everything to the same location, you can move everything to different locations, or you can move only the virtual hard disks.
    movevm_hvmwiz2
  5. What screens you see next will depend on what you chose. We’ll cover each branch.

If you opt to move everything to one location, the wizard will show you this simple page:

movevm_hvmwiz3

If you choose the option to Move the virtual machine’s data to different locations, you will first see this screen:

movevm_hvmwiz4

For every item that you check, you will be given a separate screen where you indicate the desired location for that item. The wizard uses the same screen for these items as it does for the hard-disks only option. I’ll show its screen shot next.

If you choose Move only the virtual machine’s virtual hard disks, then you will be given a sequence of screens where you instruct it where to move the files. These are the same screens used for the individual components from the previous selection:

movevm_hvmwiz5

After you make your selections, you’ll be shown a summary screen where you can click Finish to perform the move:

movevm_hvmwiz6

How to Perform Hyper-V Storage Migration with Failover Cluster Manager

Failover Cluster Manager uses a slick single-screen interface to move storage for cluster virtual machines. To access it, simply right-click a virtual machine, hover over Move, and click Virtual Machine Storage. You’ll see the following screen:

movecm_fcm1

If you just want to move the whole thing to one of the display Cluster Shared Volumes, just drag and drop it down to that CSV in the Cluster Storage heading at the lower left. You can drag and drop individual items or the entire VM. The Destination Folder Path will be populated accordingly.

As you can see in mine, I have all of the components except the VHD on an SMB share. I want to move the VHD to be with the rest. To get a share to show up, click the Add Share button. You’ll get this dialog:

movevm_fcmaddshare

The share will populate underneath the CSVs in the lower left. Now, I can drag and drop that file to the share. View the differences:

movecm_fcm2

Once you have the dialog the way that you like it, click Start.

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

Few Hyper-V topics burn up the Internet quite like “performance”. No matter how fast it goes, we always want it to go faster. If you search even a little, you’ll find many articles with long lists of ways to improve Hyper-V’s performance. The less focused articles start with general Windows performance tips and sprinkle some Hyper-V-flavored spice on them. I want to use this article to tighten the focus down on Hyper-V hardware settings only. That means it won’t be as long as some others; I’ll just think of that as wasting less of your time.

1. Upgrade your system

I guess this goes without saying but every performance article I write will always include this point front-and-center. Each piece of hardware has its own maximum speed. Where that speed barrier lies in comparison to other hardware in the same category almost always correlates directly with cost. You cannot tweak a go-cart to outrun a Corvette without spending at least as much money as just buying a Corvette — and that’s without considering the time element. If you bought slow hardware, then you will have a slow Hyper-V environment.

Fortunately, this point has a corollary: don’t panic. Production systems, especially server-class systems, almost never experience demand levels that compare to the stress tests that admins put on new equipment. If typical load levels were that high, it’s doubtful that virtualization would have caught on so quickly. We use virtualization for so many reasons nowadays, we forget that “cost savings through better utilization of under-loaded server equipment” was one of the primary drivers of early virtualization adoption.

2. BIOS Settings for Hyper-V Performance

Don’t neglect your BIOS! It contains some of the most important settings for Hyper-V.

  • C States. Disable C States! Few things impact Hyper-V performance quite as strongly as C States! Names and locations will vary, so look in areas related to Processor/CPU, Performance, and Power Management. If you can’t find anything that specifically says C States, then look for settings that disable/minimize power management. C1E is usually the worst offender for Live Migration problems, although other modes can cause issues.
  • Virtualization support: A number of features have popped up through the years, but most BIOS manufacturers have since consolidated them all into a global “Virtualization Support” switch, or something similar. I don’t believe that current versions of Hyper-V will even run if these settings aren’t enabled. Here are some individual component names, for those special BIOSs that break them out:
    • Virtual Machine Extensions (VMX)
    • AMD-V — AMD CPUs/mainboards. Be aware that Hyper-V can’t (yet?) run nested virtual machines on AMD chips
    • VT-x, or sometimes just VT — Intel CPUs/mainboards. Required for nested virtualization with Hyper-V in Windows 10/Server 2016
  • Data Execution Prevention: DEP means less for performance and more for security. It’s also a requirement. But, we’re talking about your BIOS settings and you’re in your BIOS, so we’ll talk about it. Just make sure that it’s on. If you don’t see it under the DEP name, look for:
    • No Execute (NX) — AMD CPUs/mainboards
    • Execute Disable (XD) — Intel CPUs/mainboards
  • Second Level Address Translation: I’m including this for completion. It’s been many years since any system was built new without SLAT support. If you have one, following every point in this post to the letter still won’t make that system fast. Starting with Windows 8 and Server 2016, you cannot use Hyper-V without SLAT support. Names that you will see SLAT under:
    • Nested Page Tables (NPT)/Rapid Virtualization Indexing (RVI) — AMD CPUs/mainboards
    • Extended Page Tables (EPT) — Intel CPUs/mainboards
  • Disable power management. This goes hand-in-hand with C States. Just turn off power management altogether. Get your energy savings via consolidation. You can also buy lower wattage systems.
  • Use Hyperthreading. I’ve seen a tiny handful of claims that Hyperthreading causes problems on Hyper-V. I’ve heard more convincing stories about space aliens. I’ve personally seen the same number of space aliens as I’ve seen Hyperthreading problems with Hyper-V (that would be zero). If you’ve legitimately encountered a problem that was fixed by disabling Hyperthreading AND you can prove that it wasn’t a bad CPU, that’s great! Please let me know. But remember, you’re still in a minority of a minority of a minority. The rest of us will run Hyperthreading.
  • Disable SCSI BIOSs. Unless you are booting your host from a SAN, kill the BIOSs on your SCSI adapters. It doesn’t do anything good or bad for a running Hyper-V host but slows down physical boot times.
  • Disable BIOS-set VLAN IDs on physical NICs. Some network adapters support VLAN tagging through boot-up interfaces. If you then bind a Hyper-V virtual switch to one of those adapters, you could encounter all sorts of network nastiness.

3. Storage Settings for Hyper-V Performance

I wish the IT world would learn to cope with the fact that rotating hard disks do not move data very quickly. If you just can’t cope with that, buy a gigantic lot of them and make big RAID 10 arrays. Or, you could get a stack of SSDs. Don’t get six or so spinning disks and get sad that they “only” move data at a few hundred megabytes per second. That’s how the tech works.

Performance tips for storage:

  • Learn to live with the fact that storage is slow.
  • Remember that speed tests do not reflect real world load and that file copy does not test anything except permissions.
  • Learn to live with Hyper-V’s I/O scheduler. If you want a computer system to have 100% access to storage bandwidth, start by checking your assumptions. Just because a single file copy doesn’t go as fast as you think it should, does not mean that the system won’t perform its production role adequately. If you’re certain that a system must have total and complete storage speed, then do not virtualize it. The only way that a VM can get that level of speed is by stealing I/O from other guests.
  • Enable read caches
  • Carefully consider the potential risks of write caching. If acceptable, enable write caches. If your internal disks, DAS, SAN, or NAS has a battery backup system that can guarantee clean cache flushes on a power outage, write caching is generally safe. Internal batteries that report their status and/or automatically disable caching are best. UPS-backed systems are sometimes OK, but they are not foolproof.
  • Prefer few arrays with many disks over many arrays with few disks.
  • Unless you’re going to store VMs on a remote system, do not create an array just for Hyper-V. By that, I mean that if you’ve got six internal bays, do not create a RAID-1 for Hyper-V and a RAID-x for the virtual machines. That’s a Microsoft SQL Server 2000 design. This is 2017 and you’re building a Hyper-V server. Use all the bays in one big array.
  • Do not architect your storage to make the hypervisor/management operating system go fast. I can’t believe how many times I read on forums that Hyper-V needs lots of disk speed. After boot-up, it needs almost nothing. The hypervisor remains resident in memory. Unless you’re doing something questionable in the management OS, it won’t even page to disk very often. Architect storage speed in favor of your virtual machines.
  • Set your fibre channel SANs to use very tight WWN masks. Live Migration requires a hand off from one system to another, and the looser the mask, the longer that takes. With 2016 the guests shouldn’t crash, but the hand-off might be noticeable.
  • Keep iSCSI/SMB networks clear of other traffic. I see a lot of recommendations to put each and every iSCSI NIC on a system into its own VLAN and/or layer-3 network. I’m on the fence about that; a network storm in one iSCSI network would probably justify it. However, keeping those networks quiet would go a long way on its own. For clustered systems, multi-channel SMB needs each adapter to be on a unique layer 3 network (according to the docs; from what I can tell, it works even with same-net configurations).
  • If using gigabit, try to physically separate iSCSI/SMB from your virtual switch. Meaning, don’t make that traffic endure the overhead of virtual switch processing, if you can help it.
  • Round robin MPIO might not be the best, although it’s the most recommended. If you have one of the aforementioned network storms, Round Robin will negate some of the benefits of VLAN/layer 3 segregation. I like least queue depth, myself.
  • MPIO and SMB multi-channel are much faster and more efficient than the best teaming.
  • If you must run MPIO or SMB traffic across a team, create multiple virtual or logical NICs. It will give the teaming implementation more opportunities to create balanced streams.
  • Use jumbo frames for iSCSI/SMB connections if everything supports it (host adapters, switches, and back-end storage). You’ll improve the header-to-payload bit ratio by a meaningful amount.
  • Enable RSS on SMB-carrying adapters. If you have RDMA-capable adapters, absolutely enable that.
  • Use dynamically-expanding VHDX, but not dynamically-expanding VHD. I still see people recommending fixed VHDX for operating system VHDXs, which is just absurd. Fixed VHDX is good for high-volume databases, but mostly because they’ll probably expand to use all the space anyway. Dynamic VHDX enjoys higher average write speeds because it completely ignores zero writes. No defined pattern has yet emerged that declares a winner on read rates, but people who say that fixed always wins are making demonstrably false assumptions.
  • Do not use pass-through disks. The performance is sometimes a little bit better, but sometimes it’s worse, and it almost always causes some other problem elsewhere. The trade-off is not worth it. Just add one spindle to your array to make up for any perceived speed deficiencies. If you insist on using pass-through for performance reasons, then I want to see the performance traces of production traffic that prove it.
  • Don’t let fragmentation keep you up at night. Fragmentation is a problem for single-spindle desktops/laptops, “admins” that never should have been promoted above first-line help desk, and salespeople selling defragmentation software. If you’re here to disagree, you better have a URL to performance traces that I can independently verify before you even bother entering a comment. I have plenty of Hyper-V systems of my own on storage ranging from 3-spindle up to >100 spindle, and the first time I even feel compelled to run a defrag (much less get anything out of it) I’ll be happy to issue a mea culpa. For those keeping track, we’re at 6 years and counting.

4. Memory Settings for Hyper-V Performance

There isn’t much that you can do for memory. Buy what you can afford and, for the most part, don’t worry about it.

  • Buy and install your memory chips optimally. Multi-channel memory is somewhat faster than single-channel. Your hardware manufacturer will be able to help you with that.
  • Don’t over-allocate memory to guests. Just because your file server had 16GB before you virtualized it does not mean that it has any use for 16GB.
  • Use Dynamic Memory unless you have a system that expressly forbids it. It’s better to stretch your memory dollar farther than wring your hands about whether or not Dynamic Memory is a good thing. Until directly proven otherwise for a given server, it’s a good thing.
  • Don’t worry so much about NUMA. I’ve read volumes and volumes on it. Even spent a lot of time configuring it on a high-load system. Wrote some about it. Never got any of that time back. I’ve had some interesting conversations with people that really did need to tune NUMA. They constitute… oh, I’d say about .1% of all the conversations that I’ve ever had about Hyper-V. The rest of you should leave NUMA enabled at defaults and walk away.

5. Network Settings for Hyper-V Performance

Networking configuration can make a real difference to Hyper-V performance.

  • Learn to live with the fact that gigabit networking is “slow” and that 10GbE networking often has barriers to reaching 10Gbps for a single test. Most networking demands don’t even bog down gigabit. It’s just not that big of a deal for most people.
  • Learn to live with the fact that a) your four-spindle disk array can’t fill up even one 10GbE pipe, much less the pair that you assigned to iSCSI and that b) it’s not Hyper-V’s fault. I know this doesn’t apply to everyone, but wow, do I see lots of complaints about how Hyper-V can’t magically pull or push bits across a network faster than a disk subsystem can read and/or write them.
  • Disable VMQ on gigabit adapters. I think some manufacturers are finally coming around to the fact that they have a problem. Too late, though. The purpose of VMQ is to redistribute inbound network processing for individual virtual NICs away from CPU 0, core 0 to the other cores in the system. Current-model CPUs are fast enough that they can handle many gigabit adapters.
  • If you are using a Hyper-V virtual switch on a network team and you’ve disabled VMQ on the physical NICs, disable it on the team adapter as well. I’ve been saying that since shortly after 2012 came out and people are finally discovering that I’m right, so, yay? Anyway, do it.
  • Don’t worry so much about vRSS. RSS is like VMQ, only for non-VM traffic. vRSS, then, is the projection of VMQ down into the virtual machine. Basically, with traditional VMQ, the VMs’ inbound traffic is separated across pNICs in the management OS, but then each guest still processes its own data on vCPU 0. vRSS splits traffic processing across vCPUs inside the guest once it gets there. The “drawback” is that distributing processing and then redistributing processing causes more processing. So, the load is nicely distributed, but it’s also higher than it would otherwise be. The upshot: almost no one will care. Set it or don’t set it, it’s probably not going to impact you a lot either way. If you’re new to all of this, then you’ll find an “RSS” setting on the network adapter inside the guest. If that’s on in the guest (off by default) and VMQ is on and functioning in the host, then you have vRSS. woohoo.
  • Don’t blame Hyper-V for your networking ills. I mention this in the context of performance because your time has value. I’m constantly called upon to troubleshoot Hyper-V “networking problems” because someone is sharing MACs or IPs or trying to get traffic from the dark side of the moon over a Cat-3 cable with three broken strands. Hyper-V is also almost always blamed by people that just don’t have a functional understanding of TCP/IP. More wasted time that I’ll never get back.
  • Use one virtual switch. Multiple virtual switches cause processing overhead without providing returns. This is a guideline, not a rule, but you need to be prepared to provide an unflinching, sure-footed defense for every virtual switch in a host after the first.
  • Don’t mix gigabit with 10 gigabit in a team. Teaming will not automatically select 10GbE over the gigabit. 10GbE is so much faster than gigabit that it’s best to just kill gigabit and converge on the 10GbE.
  • 10x gigabit cards do not equal 1x 10GbE card. I’m all for only using 10GbE when you can justify it with usage statistics, but gigabit just cannot compete.

6. Maintenance Best Practices

Don’t neglect your systems once they’re deployed!

  • Take a performance baseline when you first deploy a system and save it.
  • Take and save another performance baseline when your system reaches a normative load level (basically, once you’ve reached its expected number of VMs).
  • Keep drivers reasonably up-to-date. Verify that settings aren’t lost after each update.
  • Monitor hardware health. The Windows Event Log often provides early warning symptoms, if you have nothing else.

 

Further reading

If you carry out all (or as many as possible) of the above hardware adjustments you will witness a considerable jump in your hyper-v performance. That I can guarantee. However, for those who don’t have the time, patience or prepared to make the necessary investment in some cases, Altaro has developed an e-book just for you. Find out more about it here: Supercharging Hyper-V Performance for the time-strapped admin.

How to Use Hyper-V and Kali Linux to Securely Wipe a Hard Drive

How to Use Hyper-V and Kali Linux to Securely Wipe a Hard Drive

The exciting time has come for my wife’s laptop to be replaced. After all the fun parts, we’ve still got this old laptop on our hands, though. Normally, we donate old computers to the local Goodwill. They’ll clean them up and sell them for a few dollars to someone else. Of course, we have no idea who will be getting the computer, and we don’t know what processes Goodwill puts them through before putting them on the shelf. A determined attacker might be able to retrieve social security numbers, bank logins, and other things that we’d prefer to keep private. As usual, I will wipe the hard drive prior to the donation. This time though, I have some new toys to use: Hyper-V and Kali Linux.

Why Use Hyper-V and Kali Linux to Securely Wipe a Physical Drive?

I am literally doing this because I can. You can easily find any number of other ways to wipe a drive. My reasons:

  • I don’t have any experience with Windows-based apps that wipe drives and didn’t find any freebies that spoke to me
  • I don’t really want to deal with booting this old laptop up to one of those security CDs
  • Kali Linux focuses on penetration testing, but Kali is also the name of the Hindu goddess of destruction. For a bit of fun, do an Internet image search on her, but maybe not around small children. What’s more appropriate than unleashing Kali on a disk you want to wipe?
  • I don’t want to deal with a Kali Live CD any more than I want to use one of the other CD-based tools, nor do I want to build a physical Kali box just for this. I already have Kali running in a virtual machine.
  • It’s very convenient for me to connect an external 2.5″ SATA disk to my Windows 10 system.

So yeah, I’m doing this mostly for fun.

Connect the Drive

I’m assuming that you’ve already got a Hyper-V installation with a Kali Linux guest. If not, get those first.

Since we’re working with a physical drive, you also need a way to physically connect the drive to the Hyper-V host. In my case, I have an old Seagate FreeAgent GoFlex that works perfectly for this. It has an enclosure for a small SATA drive and a detachable USB interface-to-SATA connector. I just pop off their drive and plug into the laptop drive, and voila! I can connect her drive to my PC via USB.

how I connected the hard drive

You might need to come up with some other method, like cracking your case and connecting the cables. Hopefully not.

I plugged the disk into my Windows 10 system, and as expected, it appeared immediately. Next, I went into Disk Management and took the disk Offline.

hard disk management page

I then went into Hyper-V Manager and ensured the Kali guest was running. I opened its settings page to the SCSI Controller page. There, I clicked the Add button.

Adding the hard drive

It created a new logical connection and asked me if I wanted a new VHDX or to connect a physical disk. In this case, the physical disk is what we’re after.

select physical hard disk

After clicking OK, the disk immediately appeared in Kali.

In Kali, open the terminal from the launcher at the left:

Kali Linux terminal launch

Use lsblk to verify that Kali can see your disk. I already had my terminal open so that I could perform a before and after for you:

Kali Linux terminal

Remember that Linux marks the SATA disks in order as sda, sdb, sdc, etc. So, I know that the last disk that it detected is sdb, even if I hadn’t run the before and after.

Use shred to Perform the Wipe

Now that we’ve successfully connected the drive, we only need to perform the wipe. We’ll use the “shred” utility for that purpose. On other distributions, you’d usually need to install that from a repository. Kali already has it waiting for you, of course.

The shred utility has a number of options. Use shred –help to view them all. In my case, I want to view progress and I want to increase the number of passes from the default of 3 to 4. I’ve been told that analog readers can sometimes go as far as three layers deep. Apparently, even that is untrue. It seems a that a single pass will do the trick. However, old paranoia dies hard. So, four passes it is.

I used:

Kali Linux

And then, I found something else to do. As you can imagine, overwriting every spot on a 250GB laptop disk takes quite some time.

Because of the time involved, I needed to temporarily disable Windows 10 sleep mode. Otherwise, Connected Standby would interrupt the process.

disabling sleep mode

After the process completed, I used Hyper-V Manager to remove the disk from the VM. Since I never mounted it in Kali, I didn’t need to do anything special there. After that, I bolted the drive back into the laptop. It’s on its way to its happy new owner, and I don’t need to worry about anyone stealing our information from it.

How to Avoid NTFS Permissions Problems During Hyper-V Live Migration

How to Avoid NTFS Permissions Problems During Hyper-V Live Migration

The title of this article describes the symptoms fairly well. You Live Migrate a virtual machine that’s backed by SMB storage, and the permissions shift in a way that prevents the virtual machine from being used. You’d have to be fairly sharp-eyed to notice before it causes problems, though. I didn’t catch on until virtual machines started failing because the hosts didn’t have sufficient permissions to start them. I don’t have a true fix, meaning that I can’t prevent the permissions from changing. However, I can show you how to eliminate the problem.

The root problem also affects local and Cluster Shared Volume locations, although the default permissions generally prevent blocking problems from manifesting.

I have experienced the problem on both 2012 R2 and 2016. The Hyper-V host causes the problem, so the operating system running on the SMB system doesn’t matter.

Symptom of Broken NTFS Permissions for Hyper-V

I discovered the problem when one of my nodes went down for maintenance and all of its virtual machines crashed. It only affected my test cluster, which I don’t keep a close eye on. That means that I can’t tell you when this became a problem. I do know that this behavior is fairly new (sometime in late 2016 or 1Q/2Q 2017).

Symptom 1: Cluster event logs will fill up with the generic access denied (0x80070005) message.

For example, Hyper-V-VMMS; Event ID 20100:

Hyper-V-High-Availability; Event ID 21502:

You will also have several of the more generic FailoverClustering IDs 1069, 1205, and 1254 and Hyper-V-High-Availability IDs 21102 and 21111 as the cluster service desperately tries to sort out the problem.

Symptom 2: Virtual machines disappear from Hyper-V Manager on all nodes while still appearing in Failover Cluster Manager.

Because the cluster can’t register the virtual machine ID on the target Hyper-V host, you won’t see it in Hyper-V Manager. The cluster still knows about it though. Remember that, even if they’re named the same, the objects that you see as Roles in Failover Cluster Manager are different objects than what you see in Hyper-V Manager. Don’t panic! As long as the cluster still knows about the objects, it can still attempt to register them once you’ve addressed the underlying problem.

What Happened?

I’m guessing that “helper” behavior gone awry has caused unintentional problems. When you Live Migrate a virtual machine, Hyper-V tries to “fix” permissions, even when they’re not broken. It adjusts the NTFS permissions for the host.

The GUI ACL looks like this:

broken ntfs settings

The permission level that I set, and that I counsel everyone to set, is Full Control. As you can see, it’s been reduced. We click Advanced as the first investigative step and see:

broken ntfs advanced settings

The Access still only tells us Special, but we can see that inheritance did not cause this. Whatever changes the permissions is making the changes directly on this folder. This is the same folder that’s shared via SMB. Double-clicking the entry and then clicking the Show advanced permissions link at the right shows us the new permission set:

broken ntfs new permissions

When I first found the permissions in this condition, I thought, “Huh, I wonder why/when I did that?” Then I set Full Control again. After the very next Live Migration, these permissions were back! Once I discovered that behavior, I tested other Live Migration types, such as using Cluster Shared Volumes. It does occur on those as well. However, the default permissions on CSVs have other entries that ensure that this particular issue does not prevent virtual machines from functioning. VMs on SMB shares don’t automatically have that kind of luck — but they can benefit from a similar configuration.

Permanently Correcting Live Migration NTFS Permission Problems

I don’t know why Hyper-V selects these particular permissions. I don’t know precisely which of those unchecked boxes cause these problems.

I do know how to prevent the problem from adversely affecting your virtual machines. In fact, even in the absence of the problem, I would label this as a “best practice” because it reduces overall administrative effort.

  1. In Active Directory (I’ll use Active Directory Users and Computers; you could also use PowerShell), create a new security group. For my test environment, I call mine “Hyper-V Hosts”. In a larger domain, you’ll likely want more granular groups.
    broken ntfs group
  2. Select all of the Hyper-V hosts that you want in that new group. Right-click them and click Add to group.
    brokenntfs_hostlistadd
  3. In the Select Groups dialog, enter or browse to the group that you just created. Click OK to add them.
    broken ntfs group
  4. Restart the Workstation service on each of the Hyper-V hosts.
  5. On the target SMB system, add the new group to the ACL of the folder at the root of the share. I personally recommend that you change both SMB and NTFS permissions, although the problem only manifests on NTFS. Grant the group Full Control.
    broken ntfs virtual machines

You will now be able to Live Migrate and start virtual machines from this SMB share. If your virtual machines disappeared from Hyper-V Manager, use Failover Cluster Manager to start and/or Live Migrate them. It will take care of any missing registrations.

Why Does this Work?

Through group permissions, the same object can effectively appear multiple times in a single NTFS ACL (access control list). When that happens, NTFS grants the least restrictive set of permissions. So, while the SVHV1’s specific ACE (access control entry) excludes Write attributes, the Hyper-V Hosts group’s ACE includes it. When NTFS accumulates all possible permissions that could apply to SVHV1, it will find an Allow entry for the Write attributes property (and others not set on ACE specific to SVHV1). If it found a Deny anywhere, that would override any conflicting Allow. However, there are no Deny settings, so that single Allow wins.

Do remember that when a computer accesses an NTFS folder through an SMB share, the permissions on that share must be at least as permissive as NTFS in order for access to work as expected. So, if the SMB permission only allows Read, then it won’t matter that the NTFS allows Full Control. When NTFS permissions and SMB permissions must be evaluated together, the most restrictive cumulative effect applies. I’m mostly telling you this for completeness; Hyper-V will not modify SMB permissions. If they worked before, they’ll continue to work. However, I do recommend that you add the same group with Full Control permissions to the share.

As I mentioned before, I recommend that you adopt the group membership tactic whether you need it or not. When you commission new Hyper-V hosts, you’ll only need to add them to the appropriate groups for SMB access to work automatically. When you decommission servers, you won’t need to go around cleaning up broken SID ACEs.

How to Optimize Hyper-V Performance for Dell PowerEdge T20

How to Optimize Hyper-V Performance for Dell PowerEdge T20

 

A little while back, we published an eBook detailing how to build an inexpensive Hyper-V cluster. At that price point, you’re not going to find anything that breaks performance records. Such a system could meet the needs of a small business, though. For those of you lucky enough to have a more substantial budget, it also works well as a cheap test lab. Whatever your usage, the out-of-box performance can be improved.

The steps in this article were written using the hardware in the previously linked eBook. If you have a Dell T20 that uses a different build, you may not have access to the same options. You may also need to look elsewhere for guidance on configuring additional hardware that I do not have.

A little upfront note: Never expect software or tweaks to match better hardware. If you expect a few switches and tips to turn a T20 into competition for a latest generation PowerEdge R-series, you will leave disappointed. I am always amazed by people that buy budget hardware and then get upset because it acts like budget hardware. If you need much better performance, break out your wallet and buy better hardware.

Step 1: Disable C-States

The number one thing you should always do on all systems to improve Hyper-V performance: disable C-States. You make that change in the system’s BIOS. The T20’s relevant entry appears below. Just clear the box.

t20perf_cstates

I also recommend that you disable SpeedStep, although you probably won’t gain much by doing so.

Step 2: Update Drivers

I know, I know, updating drivers is the oldest of all so-called “performance enhancement” cliches. Bear with me, though. All of the hardware works just fine with Windows default drivers, but the drivers unlock some options that you’ll need.

Start at https://support.dell.com. You’ll be asked for the system’s service tag. At an elevated PowerShell prompt, enter gwmi win32_bios and look at the SerialNumber line:

t20perf_servicetag

Highlight and press [Enter] to copy it to the clipboard.

Select the Drivers and Downloads tab, then locate the Change OS link so that you can select the correct operating system. Dell periodically changes their support site, so you may something different, but these named options have been the same for a while:

t20perf_driversystem

Items that you want:

  • BIOS (reboots without asking; stop your VMs first)
  • Chipset
  • Intel(R) Management Engine Components Installer
  • Intel Rapid Storage Technology Driver and Management Console
  • Intel Rapid Storage Technology F6 Driver

After gathering those files, go to Intel’s support site: https://downloadcenter.intel.com/.

This site also changes regularly. What I did was search for the “I217-LM”. On its list of downloads, I found the Intel Ethernet Adapter Connections CD. That includes drivers for just about every Intel network adapter in existence. If you have the system build that I described in the eBook, this file will update the onboard adapter and the add-in PRO/1000 PTs (and any other Intel network adapters that you might have chosen).

If you’re targeting a GUI-less system, unblock the files. An example:

If you prefer the mouse, then you can use each item’s individual property dialog instead.

Also make sure that you use a GUI system to unzip the Intel CD prior to attempting to install on a GUI-less system.

I’m sure you can install drivers without my help. Do that and read on.

Step 3: Networking Performance Tweaks

Three things you want to do for networking:

  1. Enable jumbo frames for storage adapters
  2. Disable power management
  3. Disable VMQ on any team adapters

Enabling Jumbo Frames

First, make sure that jumbo frames are enabled on your physical switch. It’s always OK for a system to use smaller frames on equipment that has larger frames enabled. The other way around usually just causes excessive fragmentation. That hurts performance, but things still work. Sometimes, it causes Ethernet frames to never be delivered. Always configure your switch first. Many require a power cycle for the change to take effect.

Once jumbo frames are set on your switch, make the change on the T20’s physical adapters. You can make the change in the GUI or in PowerShell.

Enabling Jumbo Frames via the GUI

  1. In Network Connections, access an Intel adapter’s property sheet.
  2. Click the Configure button.
  3. Switch to the Advanced tab.
  4. Set Jumbo Packet to its highest number; usually 9014.

When you install the Intel network drivers and management pack, the I217-LM driver page will look like the following:

t20perf_i217jumbo

Intel adapters not under management will look like this:

t20perf_regularjumbo

Enabling Jumbo Frames in PowerShell

PowerShell makes this fast and simple:

Disabling Network Adapter Power Management

Windows likes to turn off network adapters. Unfortunately, it doesn’t always do the best job ensuring that you’re not still using it. You can disable power management using the GUI or PowerShell.

Disabling Network Adapter Power Management in the GUI

Navigate to the adapter’s properties like you did to enable jumbo frames. This time, go to the Power Management tab. For a device under the control of the Intel management system, just uncheck Reduce link speed during system idle.

t20perf_i217speedreduce

For adapters using default drivers, uncheck Allow the computer to turn off this device to save power:

t20perf_regularnetpm

Disabling Network Adapter Power Management in PowerShell

The process is a bit more involved in PowerShell, but I’ve done the hard work for you. Just copy/paste into an elevated PowerShell prompt or run as a script:

Disable VMQ on Team Adapters

None of the adapters included with this system or in the eBook build support VMQ. That’s good because I don’t know of any manufacturers that properly implement VMQ on gigabit adapters. However, if you create a native Microsoft LBFO team, VMQ will be enabled on it. Whether or not it does anything… I don’t know. I do know that I seemed to clear up some strange issues when I disabled it on 2012. So, I’ve been doing it ever since. It’s quick and easy, so even if it doesn’t help, it certainly won’t hurt.

Note: If you are using the build from the eBook, only follow this section on the Hyper-V hosts. The storage server won’t use VMQ anyway.

Disabling VMQ on Team Adapters Using the GUI

Find the team adapter in Network Connections. It should be quite obvious, since the icon shows two physical adapters. Its description field will say Microsoft Network Adapter Multiplexor Driver.

t20perf_teamadapter

Open it up and get into its adapter configuration properties just as you did for the physical adapters above. Switch to the Advanced tab. Find the Virtual Machine Queues entry and set it to Disabled:

t20perf_teamadapterkillvmq

Disabling VMQ on Team Adapters in PowerShell

PowerShell can make short work of this task:

Step 4: Storage Performance Tweaks

The disks in these systems are slow. Nothing will change that. But, we can even out their transfer patterns a bit.

Changing the Disk Cache Mode for the Hyper-V Hosts

The Hyper-V hosts don’t do a great deal of disk I/O. In my personal configuration, I do place my domain controllers locally. However, for any domain these systems could reasonably handle, the domain controllers will perform very little I/O. We’ll enable the read cache on these systems. It will help, but you may not see much improvement due to their normal usage pattern.

Note: I have not attempted any of this on a GUI-less system. If the graphical interface works, you’ll find its exe at: “C:\Program Files\Intel\Intel(R) Rapid Storage Technology\IAStorUI.exe”.

Under the Intel Start menu entry, open Intel® Rapid Storage Technology. Switch to the Performance tab. You could disable the Link Power Management. Its not going to help much on the Hyper-V hosts. Change the Cache mode to Read only.

t20perf_hvstoragecache

Changing the Disk Cache Mode for the Storage Host

The storage server does most of the heavy lifting in this build. We can set some stronger caching policies that will help its performance.

Warning: These steps are safe only if you have a battery backup system that will safely shut down the system in the event of a power outage. As shipped, these systems do not have an internal battery backup for the RAID arrays. You can purchase add-on cards that provide that functionality. My system has one external battery that powers all three hosts. However, its USB interface connects only to the storage system. Do not follow these steps for your Hyper-V hosts unless you have a mechanism to gracefully shut them down in a power outage.

Follow the same steps to access the Intel® Rapid Storage Technology‘s Performance tab as you did on the Hyper-V hosts. This time, disable the power management option, enable write-cache buffer flushing, and set the cache mode to Write back:

t20perf_batterystoragecache

Microsoft’s Tuning Guide

At this point, you’ve made the best improvements that you’re likely to get with this hardware. However, Microsoft publishes tuning guides that might give you a bit more.

Access the 2016 tuning guide: https://docs.microsoft.com/en-us/windows-server/administration/performance-tuning/role/hyper-v-server/index

The 2016 guide doesn’t contain very many instructions to follow; it contains a great deal of information. Aside from changing the power plan to High Performance, you won’t find much to do.

The 2012 R2 guide contains more activities: https://msdn.microsoft.com/en-us/library/windows/hardware/dn567657(v=vs.85).aspx. I do not know how many of these settings are still honored in 2016. I do know that any further changes that you make based on this guide involve trade-offs. For instance, you can disable the I/O balancer; that might speed up I/O for one VM that feels slow, but at the cost of allowing storage bottlenecks.

Test

After any performance change, test things out. You shouldn’t expect to see any Earth-shattering improvements. You definitely don’t want things to become worse. If any issues occur, retrace your steps and undo changes until performance returns — if it returns. It’s not uncommon for performance tweaking to uncover failing hardware. It’s best to carry out these changes soon after you spin up your new equipment for that reason.

Testing Jumbo Frames

Verify that jumbo frame works by pinging a target IP connected via physical switch using the following form:

If pings drop (but normal pings go through) or you receive a message that says, “Packet needs to be fragmented but DF set.”, then something along the way does not support jumbo frames.

The “8000” number doesn’t need to be exact, but it must be large enough to ensure that you are sending a jumbo Ethernet frame (somewhere in the 6000s and over). Due to variances in the way that “jumbo” can be calculated, the displayed “9014” will almost never work. Usually, Windows will send an unfragmented ping no larger than 8972 bytes.

Verify Settings After Driver Updates

Some driver updates return settings to defaults. You might lose jumbo frames and power management settings, for instance. It’s tempting to automate driver settings, but network setting changes cause network transmission interrupts. You’re better off performing manual verification.

 

4 Ways to Transfer Files to a Linux Hyper-V Guest

4 Ways to Transfer Files to a Linux Hyper-V Guest

You’ve got a straightforward problem. You have a file on your Windows machine. You need to get that file into your Linux machine. Your Windows machine runs Hyper-V, and Hyper-V runs your Linux machine as a guest. You have many options.

Method 1) Use PowerShell and Integration Services

This article highlights the PowerShell technique as it’s the newest method, and therefore the least familiar. You’ll want to use this method when the Windows system that you’re working from hosts the target Linux machine. I’ll provide a longer list of the benefits of this method after the how-to.

Prerequisite for Copying a File into a Linux Guest: Linux Integration Services

The PowerShell method that I’m going to show you makes use of the Linux Integration Services (LIS). It doesn’t work on all distributions/versions. Check for your distribution on TechNet. Specifically, look for “File copy from host to guest”.

By default, Hyper-V disables the particular service that allows you to transfer files directly into a guest.

Enabling File Copy Guest Service in PowerShell

The cmdlet to use is Enable-VMIntegrationService. You can just type it out:

The Name parameter doesn’t work with tab completion, however, so you need to know exactly what to type in order to use that syntax.

You can use Get-VMIntegrationService for spelling assistance:

Enable-VMIntegrationService includes a VMIntegrationService parameter that accepts an object, which can be stored in a variable or piped from Get-VMIntegrationService:

You could leave out the entire where portion and pipe directly in order to enable all services for the virtual machine in one shot.

Use whatever method suits you best. You do not need to power cycle the virtual machine or make any other changes.

Enabling File Copy Guest Service in Hyper-V Manager or Failover Cluster Manager

If you’d prefer to use a GUI, either Hyper-V Manager or Failover Cluster Manager can help. To file copy for a guest in Hyper-V Manager or Failover Cluster Manager, open the Settings dialog for the virtual machine. It does not matter which tool you use. The virtual machine can be On or Off, but it cannot be Saved or Paused.

In the dialog, switch to the Integration Services tab. Check the box for Guest services and click OK.

lincopy_servicescheck

You do not need to power cycle the virtual machine or make any other changes.

Verifying the Linux Guest’s File Copy Service

You can quickly check that the service in the guest is prepared to accept a file from the host:

Look in the output for hypervfcopyd:

lincopy_serviceverify

Of course, you can supply more of the name to grep than just “hyper” to narrow it down, but this is easier to remember.

Using Copy-VMFile to Transfer a File into a Linux Guest

All right, now the prerequisites are out of the way. Use Copy-VMFile:

You can run Copy-VMFile remotely:

Notice that SourcePath must be from the perspective of ComputerName. Tab completion won’t work remotely, so you’ll need to know the precise path of the source file. It might be easier to use Enter-PSSession first so that tab completion will work.

You can create a directory on the Linux machine when you copy the file:

CreateFullPath can only create one folder. If you ask it to create a directory tree (ex: -CreateFullPath '/downloads/new' ), you’ll get an error that includes the text “failed to initiate copying files to the guest: Unspecified error (0x80004005)“.

Benefits and Notes on Using Copy-VMFile for Linux Guests

Some reasons to choose Copy-VMFile over alternatives:

  • I showed you how to use it with the VMName parameter, but Copy-VMFile also accepts VM objects. If you’ve saved the output into a variable from Get-VM or some other cmdlet that produces VM objects, you can use that variable with Copy-VMFile’s VM parameter instead of VMName.
  • The VMName and VM parameters accept arrays, so you can copy a file into multiple virtual machines simultaneously.
  • You do not need a functioning network connection within the Linux guest or between the host and the guest.
  • You do not need to open firewalls or configure any daemons inside the Linux guest.
  • The transfer occurs over the VMBus, so only your hardware capabilities can limit its speed.
  • The transfer operates under the root account, so you can place a file just about anywhere on the target system.

Notes:

  • As mentioned in the list item in the preceding list, this process runs as root. Be careful what you copy and where you place it.
  • Copied files are marked as executable for some reason.
    lincopy_executable
  • Copy-VMFile only works from host to guest. The existence of the FileSource parameter implies that you copy files the other direction, but that parameter accepts no value other than Host.

Method 2) Using WinSCP

I normally choose WinSCP for moving files to/from any Linux machine, Hyper-V guest or otherwise.

If you choose the SCP protocol when connecting to a Linux system, it will work immediately. You won’t need to install any packages first:

lincopy_scp

Once connected, you have a simple folder display for your local and target machines with simple drag and drop transfer functionality:

lincopy_winscpmain

You can easily modify the permissions and execute bit on a file (as long as you have permission):

lincopy_fileprops

You can use the built-in editor on a file or attach it to external editors. It will automatically save the output from those editors back to the Linux machine:

lincopy_edit

You can even launch a PuTTY session right from WinSCP (if PuTTY is installed):

lincopy_putty

I still haven’t found all of the features of WinSCP.

Method 3) Move Files to/from Linux with the Windows FTP Client

Windows includes a command-line ftp client. It has many features, but still only qualifies as barely more than rudimentary. You can invoke it with something like:

The above will attempt to connect to the named host and will then start an interactive session. If you’d like to start work from within an interactive session, that would look something like this:

Use ftp /?  at the command prompt for command-line assistance and help at the interactive ftp > prompt for interactive assistance.

You’ll have a few problems using this or any other standard FTP client: most Linux distributions do not ship with any FTP daemon running. Most distributions allow you to easily acquire vsftpd. I don’t normally do that because SCP is already enabled and it’s secure.

Method 4) Move Files Between Linux Guests with a Transfer VHDX

If you have a distribution that doesn’t work with Copy-VMFile, or you just don’t want to use it, you can use a portable VHDX file instead.

  1. First, create a disk. Use PowerShell so that the sparse files don’t cause the VHDX file to grow larger than necessary:

  1. Attach the VHDX to the Linux guest. If you attach to the virtual SCSI chain, you don’t need to power down the VM.
  2. Inside the Linux guest, create an empty mount location.
  3. Determine which of the attached disks can be used for transfer with sudo fdisk -l . You are looking for a /dev/sd* item that has FAT32 partition information.
    Do not use:
    lincopy_diskinuse
    Use:
    lincopy_disknotinuse
  4. Enter the following as shown. Outputs will show you what you’re doing; I’m only telling you what to type:
  5. Run sudo fdisk -l to verify that your new disk now has a W95 FAT32 partition. You need FAT32 as it’s the only file system that both Linux and Windows can use without extra effort that’s not worth it for a transfer disk.
  6. Format your new partition:

You have successfully created your transfer disk.

Use a Transfer Disk in Linux

To use a transfer disk on the Linux side, you need to attach it to the Linux machine. Then you need to mount it:

  1. Use sudo fdisk -l to verify which device Linux has assigned the disk to. Use the preceding section for hints.
  2. Once you know which device it is, mount it to your transfer mount point: mount /dev/sdb1 /transfer.  Move/copy files into/out of the /transfer folder.
  3. Once you’re finished, unmount the disk from the folder:

    or
  4. Detach the VHDX from the virtual machine.

Use a Linux Transfer Disk in Windows

You mount a VHDX in Windows via Mount-VHD (must be running Hyper-V), Mount-DiskImage, or Disk Management. Once mounted, work with it as you normally would. Mount-VHD and Disk Management will attach it to a unique drive letter; Mount-DiskImage will mount to the empty path that you specify. Once you’re finished working with it, you can use Dismount-VHD, Dismount-DiskImage (don’t forget -Save!), or Disk Management.

Be aware that even though Windows should have no trouble reading a FAT32 partition/volume created in Linux, the opposite is not true! Do not use Windows formatting tools for a Linux transfer disk! Your mileage may vary, but formatting in Linux always works, so stick to that method.

 

Hyper-V Backup Best Practices: Terminology and Basics

Hyper-V Backup Best Practices: Terminology and Basics

 

One of my very first jobs performing server support on a regular basis was heavily focused on backup. I witnessed several heart-wrenching tragedies of permanent data loss but, fortunately, played the role of data savior much more frequently. I know that most, if not all, of the calamities could have at least been lessened had the data owners been more educated on the subject of backup. I believe very firmly in the value of a solid backup strategy, which I also believe can only be built on the basis of a solid education in the art. This article’s overarching goal is to give you that education by serving a number of purposes:

  • Explain industry-standard terminology and how to apply it to your situation
  • Address and wipe away 1990s-style approaches to backup
  • Clearly illustrate backup from a Hyper-V perspective

Backup Terminology

Whenever possible, I avoid speaking in jargon and TLAs/FLAs (three-/four-letter acronyms) unless I’m talking to a peer that I’m certain has the experience to understand what I mean. When you start exploring backup solutions, you will have these tossed at you rapid-fire with, at most, brief explanations. If you don’t understand each and every one of the following, stop and read those sections before proceeding. If you’re lucky enough to be working with an honest salesperson, it’s easy for them to forget that their target audience may not be completely following along. If you’re less fortunate, it’s simple for a dishonest salesperson to ridiculously oversell backup products through scare tactics that rely heavily on your incomplete understanding.

  • Backup
  • Full/incremental/differential backup
  • Delta
  • Deduplication
  • Inconsistent/crash-consistent/application-consistent
  • Bare-metal backup/restore (BMB/BMR)
  • Disaster Recovery/Business Continuity
  • Recovery point objective (RPO)
  • Recovery time objective (RTO)
  • Retention
  • Rotation — includes terms such as Grandfather-Father-Son (GFS)

There are a number of other terms that you might encounter, although these are the most important for our discussion. If you encounter a vendor making up their own TLAs/FLAs, take a moment to investigate their meaning in comparison to the above. Most are just marketing tactics — inherently harmless attempts by a business entity trying to turn a coin by promoting its products. Some are more nefarious — attempts to invent a nail for which the company just conveniently happens to provide the only perfectly matching hammer (with an extra “value-added” price, of course).

Backup

This heading might seem pointless — doesn’t everyone know what a backup is? In my experience, no. In order to qualify as a backup, you must have a distinct, independent copy of data. A backup cannot have any reliance on the health or well-being of its source data or the media that contains that data. Otherwise, it is not a true backup.

Full/Incremental/Differential Backups

Recent technology changes and their attendant strategies have made this terminology somewhat less popular than in past decades, but it is still important to understand because it is still in widespread use. They are presented in a package because they make the most sense when compared to each other. So, I’ll give you a brief explanation of each and then launch into a discussion.

  • Full Backups: Full backups are the easiest to understand. They are a point-in-time copy of all target data.
  • Differential Backups: A differential backup is a point-in-time copy of all data that is different from the last full backup that is its parent.
  • Incremental Backups: An incremental backup is a point-in-time copy of all data that is different from the backup that is its parent.

The full backup is the safest type because it is the only one of the three that can stand alone in any circumstances. It is a complete copy of whatever data has been selected.

Full Backup

Full Backup

A differential backup is the next safest type. Remember the following:

  • To fully restore the latest data, a differential backup always requires two backups: the latest full backup and the latest differential backup. Intermediary differential backups, if any exist, are not required.
  • It is not necessary to restore from the most recent differential backup if an earlier version of the data is required.
  • Depending on what data is required and the intelligence of the backup application, it may not be necessary to have both backups available to retrieve specific items.

The following is an illustration of what a differential backup looks like:

Differential Backup

Differential Backup

Each differential backup goes all the way back to the latest full backup as its parent. Also, notice that each differential backup is slightly larger than the preceding differential backup. This phenomenon is conventional wisdom on the matter. In theory, each differential backup contains the previous backup’s changes as well as any new changes. In reality, it truly depends on the change pattern. A file backed up on Monday might have been deleted on Tuesday, so that part of the backup certainly won’t be larger. A file that changed on Tuesday might have had half its contents removed on Wednesday, which would make that part of the backup smaller. A differential backup can range anywhere from essentially empty (if nothing changed) to as large as the source data (if everything changed). Realistically, you should expect each differential backup to be slightly larger than the previous.

The following is an illustration of an incremental backup:

Incremental Backup

Incremental Backup

Incremental backups are best thought of as a chain. The above shows a typical daily backup in an environment that uses a weekly full with daily incrementals. If all data is lost and restoring to Wednesday’s backup is necessary, then every single night’s backup from Sunday onward will be necessary. If any one is missing or damaged, then it will likely not be possible to retrieve anything from that backup or any backup afterward. Therefore, incremental backups are the riskiest; they are also the fastest and consume the least amount of space.

Historically, full/incremental/differential backups have been facilitated by an archive bit in Windows. Anytime a file is changed, Windows sets its archive bit. The backup types operate with this behavior:

  • A full backup captures all target files and clears any archive bits that it finds.
  • A differential backup captures only target files that have their archive bit set and it leaves the bit in the state that it found it.
  • An incremental backup captures only files with the archive bit set and clears it afterward.
Archive Bit Example

Archive Bit Example

Delta

“Delta” is probably the most overthought word in all of technology. It means “difference”. Do not analyze it beyond that. It just means “difference”. If you have $10 in your pocket and you buy an item for $8, the $8 dollars that you spent is the “delta” between the amount of money that you had before you made the purchase and the amount of money that you have now.

The way that vendors use the term “delta” sometimes changes, but usually not by a great deal. In the earliest incarnation that I am aware of “delta” as applied to backups, it meant intra-file changes. All previous backup types operated with individual files being the smallest level of granularity (not counting specialty backups such as Exchange item-level). Delta backups would analyze the blocks of individual files, making the granularity one step finer.

The following image illustrates the delta concept:

Delta Backup

Delta Backup

A delta backup is essentially an incremental backup, but at the block level instead of the file level. Somebody got the clever idea to use the word “delta”, probably so that it wouldn’t be confused with “differential”, and the world thought it must mean something extra special because it’s Greek.

The major benefit of delta backups is that they use much less space than even incremental backups. The trade-off is in the computing power to calculate deltas. The archive bit can tell it if a file needs to be scanned, but it cannot tell it which blocks to cover. Backup systems that perform delta operations require some other method for change tracking.

Deduplication

Deduplication represents the latest iteration of backup innovation. The term explains itself quite nicely. The backup application searches for identical blocks of data and reduces them to a single copy.

Deduplication involves three major feats:

  • The algorithm that discovers duplicate blocks must operate in a timely fashion
  • The system that tracks the proper location of duplicated blocks must be foolproof
  • The system that tracks the proper location of duplicated blocks must use significantly less storage than simply keeping the original blocks

So, while deduplication is conceptually simple, implementations can depend upon advanced computer science.

Deduplication’s primary benefit is that it can produce backups that are even smaller than delta systems. Part of that will depend on the overall scope of the deduplication engine. If you were to run fifteen new Windows Server 2016 virtual machines through even a rudimentary deduplicator, it would reduce all of them to the size of approximately a single Windows Server 2016 virtual machine — a 93% savings.

There is risk in overeager implementations, however. With all data blocks represented by a single copy, each block becomes a single point of failure. The loss of a single vital block could spell disaster for a backup set. This risk can be mitigated by employing a single pre-existing best practice: always maintain multiple backups.

Inconsistent/Crash-Consistent/Application-Consistent

We already have an article set that explores these terms in some detail. Quickly:

  • Inconsistent backups would be effectively the same thing as performing a manual file copy of a directory tree.
  • Crash-consistent backup captures data as it sits on the storage volume at a given point in time, but cannot touch anything passing through the CPU or waiting in memory. You could lose any in-flight I/O operations.
  • Application-consistent backup coordinates with the operating system and, where possible, individual applications to ensure that in-flight I/Os are flushed to disk so that there are no active file changes at the moment that the backup is taken

I occasionally see people twisting these terms around, although I believe that’s most accidental. The definitions that I used above have been the most common, stretching back into the 90s. Be aware that there are some disagreements, so ensure that you clarify terminology with any salespeople.

Bare-Metal Backup/Restore

A so-called “bare-metal backup” and/or “bare metal restore” involves capturing the entirety of a storage unit including metadata portions such as the boot sector. These backup/restore types essentially mean that you could restore data to a completely empty physical system without needing to install an operating system and/or backup agent on it first.

Disaster Recovery/Business Continuity

The terms “Disaster Recovery” (DR) and “Business Continuity” are often used somewhat interchangeably in marketing literature. “Disaster Recovery” is the older term and more accurately reflects the nature of the involved solutions. “Business Continuity” is a newer, more exciting version that sounds more positive but mostly means the same thing. These two terms encompass not just restoring data, but restoring the organization to its pre-disaster state. “Business Continuity” is used to emphasize the notion that, with proper planning, disasters can have little to no effect on your ability to conduct normal business. Of course, the more “continuous” your solution is, the higher your costs are. That’s not necessarily a bad thing, but it must be understood and expected.

One thing that I really want to make very clear about disaster recovery and/or business continuity is that these terms extend far beyond just backing up and restoring your data. DR plans need to include downtime procedures, phone trees, alternative working sites, and a great deal more. You need to think all the way through a disaster from the moment that one occurs to the moment that everything is back to some semblance of normal.

Recovery Point Objective

The maximum acceptable span of time between the latest backup and a data loss event is called a recovery point objective (RPO). If the words don’t sound very much like their definition, that’s because someone worked really hard to couch a bad situation within a somewhat neutral term. If it helps, the “point” in RPO means “point in time.” Of all data adds and changes, anything that happens between backup events has the highest potential of being lost. Many technologies have some sort of fault tolerance built in; for instance, if your domain controller crashes and it isn’t due to a completely failed storage subsystem, you’re probably not going to need to go to backup. Most other databases can tell a similar story. RPOs mostly address human error and disaster. More common failures should be addressed by technology branches other than backup, such as RAID.

A long RPO means that you are willing to lose a greater span of time. A daily backup gives you a 24-hour RPO. Taking backups every two hours results in a 2-hour RPO. Remember that an RPO represents a maximum. It is highly unlikely that a failure will occur immediately prior to the next backup operation.

Recovery Time Objective

Recovery time objective (RTO) represents the maximum amount of time that you are willing to wait for systems to be restored to a functional state. This term sounds much more like its actual meaning than RPO. You need to take extra care when talking with backup vendors about RTO. They will tend to only talk about RTO in terms of restoring data to a replacement system. If your primary site is your only site and you don’t have a contingency plan for complete building loss, your RTO is however long it takes to replace that building, fill it with replacement systems, and restore data to those systems. Somehow, I suspect that a six-month or longer RTO is unacceptable for most institutions. That is one reason that DR planning must extend beyond taking backups.

In more conventional usage, RTOs will be explained as though there is always a target system ready to receive the restored data. So, if your backup drives are taken offsite to a safety deposit box by the bookkeeper when she comes in at 8 AM, your actual recovery time is essentially however long it takes someone to retrieve the backup drive plus the time needed to perform a restore in your backup application.

Retention

Retention is the desired amount of time that a backup should be kept. This deceptively simple description hides some complexity. Consider the following:

  • Legislation mandates a ten-year retention policy on customer data for your industry. A customer was added in 2007. Their address changed in 2009. Must the customer’s data be kept until 2017 or 2019?
  • Corporate policy mandates that all customer information be retained for a minimum of five years. The line-of-business application that you use to record customer information never deletes any information that was placed into it and you have a copy of last night’s data. Do you need to keep the backup copy from five years ago or is having a recent copy of the database that contains five-year-old data sufficient?

Questions such as these can plague you. Historically, monthly and annual backup tapes were simply kept for a specific minimum number of years and then discarded, which more or less answered the question for you. Tape is an expensive solution, however, and many modern small businesses do not use it. Furthermore, laws and policies only dictate that the data be kept; nothing forced anyone to ensure that the backup tapes were readable after any specific amount of time. One lesson that many people learn the hard way is that tapes stored flat can lose data after a few years. We used to joke with customers that their bits were sliding down the side of the tape. I don’t actually understand the governing electromagnetic phenomenon, but I can verify that it does exist.

With disk-based backups, the possibilities are changed somewhat. People typically do not keep stacks of backup disks lying around, and their ability to hold data for long periods of time is not the same as backup tape. The rules are different — some disks will outlive tape, others will not.

Rotation

Backup rotations deal with the media used to hold backup information. This has historically meant tape, and tape rotations often came in some very grandiose schemes. One of the most widely used rotations is called “Grandfather-Father-Son” (GFS):

  • One full backup is taken monthly. The media it is taken on is kept for an extended period of time, usually one year. One of these is often considered an annual and kept longer. This backup is called the “Grandfather”.
  • Each week thereafter, on the same day, another full backup is taken. This media is usually rotated so that it is re-used once per month. This backup is known as the “Father”.
  • On every day between full backups, an incremental backup is taken. Each day’s media is rotated so that it is re-used on the same day each week. This backup is known as the “Son”.

The purpose of rotation is to have enough backups to provide sufficient possible restore points to guard against a myriad of possible data loss instances without using so much media that you bankrupt yourself and run out of physical storage room. Grandfathers are taken offsite and placed in long-term storage. Fathers are taken offsite, but perhaps not placed in long-term storage so that they are more readily accessible. Sons are often left onsite, at least for a day or two, to facilitate rapid restore operations.

Replacing Old Concepts with New Best Practices

Some backup concepts are simply outdated, especially for the small business. Tape used to be the only feasible mass storage device that could be written and rewritten on a daily basis and were sufficiently portable. I recall being chastised by a vendor representative in 2004 because I was “still” using tape when I “should” be backing up to his expensive SAN. I asked him, “Oh, do employees tend to react well when someone says, ‘The building is on fire! Grab the SAN and get out!’?” He suddenly didn’t want talk to me anymore.

The other somewhat outdated issue is that backups used to take a very, very long time. Tape was not very fast, disks were not very fast, networks were not very fast. Differential and incremental backups were partly the answer to that problem, and partly to the problem that tape capacity was an issue. Today, we have gigantic and relatively speedy portable hard drives, networks that can move at least many hundreds of megabits per second, and external buses like USB 3 that outrun both of those things. We no longer need all weekend and an entire media library to perform a full backup.

One thing that has not changed is the need for backups to exist offsite. You cannot protect against a loss of a building if all of your data stays in that building. Solutions have evolved, though. You can now afford to purchase large amounts of bandwidth and transmit your data offsite to your alternative business location(s) each night. If you haven’t got an alternative business location, there are an uncountable number of vendors that would be happy to store your data each night in exchange for a modest (or not so modest) sum of money. I still counsel periodically taking an offline offsite backup copy, as that is a solid way to protect your organization against malicious attacks (some of which can be by disgruntled staff).

These are the approaches that I would take today that would not have been available to me a few short years ago:

  • Favor full backups whenever possible — incremental, differential, delta, and deduplicated backups are wonderful, but they are incomplete by nature. It must never be forgotten that the strength of backup lies in the fact that it creates duplicates of data. Any backup technique that reduces duplication dilutes the purpose of backup. I won’t argue against anyone saying that there are many perfectly valid reasons for doing so, but such usage must be balanced. Backup systems are larger and faster than ever before; if you can afford the space and time for full copies, get full copies.
  • Steer away from complicated rotation schemes like GFS whenever possible. Untrained staff will not understand them and you cannot rely on the availability of trained staff in a crisis.
  • Encrypt every backup every time.
  • Spend the time to develop truly meaningful retention policies. You can easily throw a tape in a drawer for ten years. You’ll find that more difficult with a portable disk drive. Then again, have you ever tried restoring from a ten-year-old tape?
  • Be open to the idea of using multiple backup solutions simultaneously. If using a combination of applications and media types solves your problem and it’s not too much overhead, go for it.

There are a few best practices that are just as applicable now as ever:

  • Periodically test your backups to ensure that data is recoverable
  • Periodically review what you are backing up and what your rotation and retention policies are to ensure that you are neither shorting yourself on vital data nor wasting backup media space on dead information
  • Backup media must be treated as vitally sensitive mission-critical information and guarded against theft, espionage, and damage
    • Magnetic media must be kept away from electromagnetic fields
    • Tapes must be stored upright on their edges
    • Optical media must be kept in dark storage
    • All media must be kept in a cool environment with a constant temperature and low humidity
  • Never rely on a single backup copy. Media can fail, get lost, or be stolen. Backup jobs don’t always complete.

Hyper-V-Specific Backup Best Practices

I want to dive into the nuances of backup and Hyper-V more thoroughly in later articles, but I won’t leave you here without at least bringing them up.

  • Virtual-machine-level backups are a good thing. That might seem a bit self-serving since I’m writing for Altaro and they have a virtual-machine-level backup application, but I fit well here because of shared philosophy. A virtual-machine-level backup gives you the following:
    • No agent installed inside the guest operating system
    • Backups are automatically coordinated for all guests, meaning that you don’t need to set up some complicated staggered schedule to prevent overlaps
    • No need to reinstall guest operating systems separately from restoring their data
  • Hyper-V versions prior to 2016 do not have a native changed block tracking mechanism, so virtual-machine-level backup applications that perform delta and/or deduplication operations must perform a substantial amount of processing. Keep that in mind as you are developing your rotations and scheduling.
  • Hyper-V will coordinate between backup applications that run at the virtual-machine-level (like Altaro VM) and the VSS writer(s) within guest Windows operating systems and the integration components within Linux guest operating systems. This enables application-consistent backups without doing anything special other than ensuring that the integration components/services are up-to-date and activated.
  • For physical installations, no application can perform a bare metal restore operation any more quickly than you can perform a fresh Windows Server/Hyper-V Server installation from media (or better yet, a WDS system). Such a physical server should only have very basic configuration and only backup/management software installed. Therefore, backing up the management operating system is typically a completely pointless endeavor. If you feel otherwise, I want to know what you installed in the management operating system that would make a bare-metal restore worth your time, as I’m betting that such an application or configuration should not be in the management operating system at all.
  • Use your backup application’s ability to restore a virtual machine next to its original so that you can test data integrity

Follow-Up Articles

With the foundational material supplied in this article, I intend to work on further posts that expand on these thoughts in greater detail. If you have any questions or concerns about backing up Hyper-V, let me know. Anything that I can’t answer quickly in comments might find its way into an article.

Critical Status in Hyper-V Manager

Critical Status in Hyper-V Manager

 

 

I’m an admitted technophile. I like blinky lights and sleek chassis and that new stuff smell and APIs and clicking through interfaces. I wouldn’t be in this field otherwise. However, if I were to compile a list of my least favorite things about unfamiliar technology, that panicked feeling when something breaks would claim the #1 slot. I often feel that systems administration sits diametrically opposite medical care. We tend to be comfortable learning by poking and prodding at things while they’re alive. When they’re dead, we’re sweating — worried that anything we do will only make the situation worse. For many of us in the Hyper-V world, that feeling first hits with the sight of a virtual machine in “Critical” status.

If you’re there, I can’t promise that the world hasn’t ended. I can help you to discover what it means and how to get back on the road to recovery.

The Various “Critical” States in Hyper-V Manager

If you ever look at the underlying WMI API for Hyper-V, you’ll learn that virtual machines have a long list of “sick” and “dead” states. Hyper-V Manager distills these into a much smaller list for its display. If you have a virtual machine in a “Critical” state, you’re only given two control options: Connect and Delete:

crit_sample

We’re fortunate enough in this case that the Status column gives some indication as to the underlying problem. That’s not always the case. That tiny bit of information might not be enough to get you to the root of the problem.

For starters, be aware that any state that includes the word “Critical” typically means that the virtual machine’s storage location has a problem. The storage device might have failed. The host may not be able to connect to storage. If you’re using SMB 3, the host might be unable to authenticate.

You’ll notice that there’s a hyphen in the state display. Before the hyphen will be another word that indicates the current or last known power state of the virtual machine. In this case, it’s Saved. I’ve only ever seen three states:

  • Off-Critical: The virtual machine was off last time the host was able to connect to it.
  • Saved-Critical: The virtual machine was in a saved state the last time the host was able to connect to it.
  • Paused-Critical: The paused state typically isn’t a past condition. This one usually means that the host can still talk to the storage location, but it has run out of free space.

There may be other states that I have not discovered. However, if you see the word “Critical” in a state, assume a storage issue.

Learning More About the Problem

If you have a small installation, you probably already know enough at this point to go find out what’s wrong. If you have a larger system, you might only be getting started. With only Connect and Delete, you can’t find out what’s wrong. You need to start by discovering the storage location that’s behind all of the fuss. Since Hyper-V Manager won’t help you, it’s PowerShell to the rescue:

Remember to use your own virtual machine name for best results. The first of those two lines will show you all of the virtual machine’s properties. It’s easier to remember in a pinch, but it also displays a lot of fields that you don’t care about. The second one pares the output list down to show only the storage-related fields. My output:

crit_powershellexample

The Status field specifically mentioned the configuration location. As you can see, the same storage location holds all of the components of this particular virtual machine. We are not looking at anything related to the virtual hard disks, though. For that, we need a different cmdlet:

Again, I recommend that you use the name of your virtual machine instead of mine. The first cmdlet will show a table display that includes the path of the virtual hard disk file, but it will likely be truncated. There’s probably enough to get you started. If not, the second shows the entire path.

crit_samplepsharddriveexample

Everything that makes up this virtual machine happens to be on the same SMB 3 share. If yours is on a SCSI target, use iscsicpl.exe to check the status of connected disks. If you’re using Fibre Channel, your vendor’s software should be able to assist you.

Correcting the Problem

In my case, the Server service was stopped on the system that I use to host SMB 3 shares. It got that way because I needed to set up a scenario for this article. To return the virtual machine to a healthy state, I only needed to start that service and wait a few moments.

Your situation will likely be different from mine, of course. Your first goal is to rectify the root of the problem. If the storage is offline, bring it up. If there’s a disconnect, reconnect. After that, simply wait. Everything should take care of itself.

When I power down my test cluster, I tend to encounter this issue upon turning everything back on. I could start my storage unit first, but the domain controllers are on the Hyper-V hosts so nothing can authenticate to the storage unit even if it’s on. I could start the Hyper-V hosts first, but then the storage unit isn’t there to challenge authentication. So, I just power the boxes up in whatever order I come to them. All I need to do is wait — the Hyper-V hosts will continually try to reach storage, and they’ll eventually be successful.

If the state does not automatically return to a normal condition, restart the “Hyper-V Virtual Machine Management” service. You’ll find it by that name in the Services control panel applet. In an elevated PowerShell session:

At an administrative command prompt:

That should clear up any remaining status issues. If it doesn’t, there is still an issue communicating with storage. Or, in the case of the Paused condition, it still doesn’t believe that the location has sufficient space to safely run the virtual machine(s).

Less Common Corrections

If you’re certain that the target storage location does not have issues and the state remains Critical, then I would move on to repairs. Try chkdsk. Try resetting/rebooting the storage system. It’s highly unlikely that the Hyper-V host is at fault, but you can also try rebooting that.

Sometimes, the constituent files are damaged or simply gone. Make sure that you can find the actual .xml (2012 R2 and earlier) or .vmcx (2016 and later) file that represents the virtual machine. Remember that it’s named with the virtual machine’s unique identifier. You can find that with PowerShell:

If the files are misplaced or damaged, your best option is restore. If that’s not an option, then Delete might be your only choice. Delete will remove any remainders of the virtual machine’s configuration files, but will not touch any virtual hard disks that belong to the virtual machine. You can create a new one and reattach those disk files.

Best of luck to you.

Page 1 of 41234