As you saw in our earlier article that explained how Live Migration works, and how you can get it going in 2012 and 2012 R2, it is a process that requires a great deal of cooperation between source and target computer. It’s also a balancing act involving a great deal of security concerns. Failures within a cluster are uncommon, especially if the cluster has passed the validation wizard. Failures in Shared Nothing Live Migrations are more likely. Most issues are simple to isolate and troubleshoot.
There are several known problem-causers that I can give you direct advice on. Some are less common. If you can’t find exactly what you’re looking for in this post, I can at least give you a starting point.
Migration-Related Event Log Entries
If you’re moving clustered virtual machines, the Cluster Events node in Failover Cluster Manager usually collects all the relevant events. If they’ve been reset or expired from that display, you can still use Event Viewer at these paths:
- Applications and Service Logs\Microsoft\Windows\Hyper-V-High-Availability\Admin
- Applications and Service Logs\Microsoft\Windows\Hyper-V-VMMS\Admin
The “Hyper-V-High-Availability” tree usually has the better messages, although it has a few nearly useless ones, such as event ID 21111, “Live migration of ‘Virtual Machine VMName’ failed.” Most Live Migration errors come with one of three statements:
- Migration operation for VMName failed
- Live Migration did not succeed at the source
- Live Migration did not succeed at the destination
These will usually, but not always, be accompanied by supporting text that further describes the problem. “Source” messages often mean that the problem is so bad and obvious that Hyper-V can’t even attempt to move the virtual machine. These usually have the most helpful accompanying text. “Destination” messages usually mean that either there is a configuration mismatch that prevents migration or the problem did not surface until the migration was either underway or nearly completed. You might find that these have no additional information or that what is given is not very helpful. In that case, specifically check for permissions issues and that the destination host isn’t having problems accessing the virtual machine’s storage.
Inability to Create Symbolic Links
As we talked about each virtual machine migration method in our explanatory article, part of the process is for the target host to create a symbolic link in C:\ProgramData\Microsoft\Windows\Hyper-V\Virtual Machines. This occurs under a special built-in credential named NT Virtual Machine\Virtual Machines, which has a “well-known” (read as: always the same) security identifier (SID) of S-1-5-83-0.
Some attempts to harden Hyper-V result in a policy being assigned from the domain involve only granting the Computer Configuration\Windows Settings\Security Settings\Local Policies\User Rights Assignment\Create symbolic links right to the built-in Administrators account. Doing so will certainly cause Live Migrations to fail and can sometimes cause other virtual machine creation events to fail.
Your best option is to just not tinker with this branch of group policy. I haven’t ever even heard of an attack mitigated by trying to improve on the contents of this area. If you simply must override it from the domain, then add in an entry in your group policy for it. You can just type in the full name as shown in the first paragraph of this section.
Note: The “Run as a Service” right must also be assigned to the same account. Not having that right usually causes more severe problems than Live Migration issues, but it’s mentioned here for completeness.
Inability to Perform Actions on Remote Computers
Live Migration and Shared Nothing Live Migration invariably involves two computers at minimum. If you’re sitting at your personal computer with a Failover Cluster Manager or PowerShell open, telling Host A to migrate a virtual machine to Host B, that’s three computers that are involved. Most Access Denied errors during Live Migrations involve this multi-computer problem.
Solution 1: CredSSP
CredSSP is kind of a terrible thing. It allows one computer to store the credentials for a user or computer and then use them on a second computer. It’s sort of like cached credentials, only transmitted over the network. It’s not overly insecure, but it’s also not something that security officers are overly fond of. You can set this option on the Authentication protocol section of the Advanced Features section of the Live Migration configuration area on the Hyper-V Settings dialog.
CredSSP has the following cons:
- Not as secure as Kerberos
- Only works when logged in directly to the source host
CredSSP has only one pro: You don’t have to configure delegation. My preference? Configure delegation.
Solution 2: Delegation
Delegation can be a bit of a pain to configure, but in the long-term it is worth it. Delegation allows one computer to pass on Kerberos tickets to other computers. It doesn’t have CredSSP’s hop limit; computers can continue passing credentials on to any computer that they’re allowed to delegate to as far as is necessary.
Delegation has the following cons:
- It can be tedious to configure.
- If not done thoughtfully, can needlessly expose your environment to security risks.
Delegation’s major pro is that as long as you can successfully authenticate to one host that can delegate, you can use it to Live Migrate to or from any host it has delegation authority for.
As far as the “thoughtfulness”, the first step is to use Constrained Delegation. It is possible to allow a computer to pass on credentials for any purpose, but it’s unnecessary.
Delegation is done using Active Directory Users and Computers or PowerShell. I have written an article that explains both ways and includes a full PowerShell script to make this much easier for multiple machines.
Be aware that delegation is not necessary for Quick or Live Migrations between nodes of the same cluster.
Mismatched Physical CPUs
Since you’re taking active threads and relocating them to a CPU in another computer, it seems only reasonable that the target CPU must have the same instruction set as the source. If it doesn’t, the migration will fail. There are hard and soft versions of this story. If the CPUs are from different manufacturers, that’s a hard stop. Live Migration is not possible. If the CPUs are from the same manufacturer, that could be a soft problem. Use CPU compatibility mode:
As shown in the screenshot, the virtual machine needs to be turned off to change this setting.
A very common question is: What are the effects of CPU compatibility mode? For almost every standard server usage, the answer is: none. Every CPU from a manufacturer has a common core set of available instructions and usually a few common extensions. Then, they have extra function sets. Applications can query the CPU for its CPUID information, which contains information about its available function sets. When the compatibility mode box is checked, all of those extra sets are hidden; the virtual machine and its applications can only see the common instruction sets. These extensions are usually related to graphics processing and are almost never used by any server software. So, VDI installations might have trouble when enabling this setting, but virtualized server environments usually will not.
This screenshot was taken from a virtual machine with compatibility disabled using CPU-Z software:
The following screen shot shows the same virtual machine with no change made except the enabling of compatibility mode:
Notice how many things are the same and what is missing from the Instructions section.
If the target system cannot satisfy the memory or disk space requirements of the virtual machine, any migration type will fail. These errors are usually very specific about what isn’t available.
Virtual Switch Name Mismatch
The virtual machine must be able to connect its virtual adapter(s) to a switch(es) with the same name. Furthermore, a clustered virtual machine cannot move if it is using an internal or private virtual switch, even if the target host has a switch with the same name.
If it’s a simple problem with a name mismatch, you can use Compare-VM to overcome the problem while still performing a Live Migration. The basic process is to use Compare-VM to generate a report, then pass that report to Move-VM. Luke Orellan has written an article explaining the basics of using Compare-VM. If you need to make other changes, such as where the files are stored, notice that Compare-VM has all of those parameters. If you use a report from Compare-VM with Move-VM, you cannot supply any other parameters to Move-VM.
Live Migration Error 0x8007274C
This is a very common error in Live Migration that always traced to network problems. If the source and destination hosts are in the same cluster, start by running the Cluster Validation Wizard, only specifying the network tests. That might tell you right away what the problem is. Other possibilities:
- Broken or not completely attached cables
- Physical adapter failures
- Physical switch failures
- Switches with different configurations
- Teams with different configurations
- VMQ errors
- Jumbo frame misconfiguration
If the problem is intermittent, check teaming configurations first; one pathway might be clear while another has a problem.
Storage Connectivity Problems
A maddening cause for some “Live Migration did not succeed at the destination” messages is a problem with storage connectivity. These aren’t always obvious, because everything might appear to be in order. Do any independent testing that you can. Specifically:
- If the virtual machine is placed on SMB 3 storage, double-check that the target host has Full Control on the SMB 3 share and the backing NTFS locations. If possible, remove and re-add the host.
- If the virtual machine is placed on iSCSI or remote Fibre Channel storage, double-check that it can work with files in that location. iSCSI connections sometimes fail silently. Try disconnecting and reconnecting to the target LUN. A reboot might be required.