Save to My DOJO
Table of contents
In Why IT Admins Adore vMotion, I described the main types of vSphere live migrations; vMotion, Storage vMotion and Enhanced vMotion. In this post, I’ll go over 5 tips to help you troubleshoot vMotion issues on the fly.
How does vMotion work?
A vMotion event can be initiated by a user, by vCenter Server itself or via an API call. There are 8 stages to a vMotion event as illustrated in Fig. 1.
Although the above steps are self-explanatory, there’s much more going under the hood. Let’s just say that vMotion is a marvelous piece of engineering. Here’s a video by VMware explaining vMotion and how to use it.
https://www.youtube.com/watch?v=YH0he0nz8Mg
How to fix and prevent vMotion issues
If your environment is not adequately prepared for vMotion, you could come across issues at any stage of a vMotion task. To being with, revisit the requirements and make sure your environment is compliant
This KB article provides a complete list. I picked a salient few as described next.
1 – Is the vMotion VMkernel option enabled?
Every vMotion capable host must by definition have the vMotion option enabled for at least one VMkernel adapter otherwise a VM will simply fail to migrate to any host where the option has not been enabled.
In the following example (Fig. 3), I disabled vMotion from the VMkernel on one of my hosts. If I try to migrate a VM to this host, the vMotion task immediately fails.
Of course, this is just for illustrative purposes but given large environments, you can easily overlook enabling the setting on one or more hosts unless you’re automating or monitoring the process.
2 – Check your advanced settings
Another common error you might come across is the infamous vMotion fails at 10% (1013150). Although you’re more likely to see this on older versions of ESXi / vCenter Server, I did come across it a couple of times when using vSphere 6.5. Regardless, if you’re experiencing vMotioning issues, make sure that the migrate.enabled advanced setting is set to 1 on all ESXi hosts.
Backup software will sometimes set this value to 0 (disabled) to ensure that backup jobs complete successfully by preventing a VM that is being backed up from vMotioning to another host. As it sometimes happens, an unplanned network or storage outage may prevent the Backup software from rolling the setting back to its original value leaving the VM in vMotion limbo.
3 – Run network diagnostics at the VMkernel level
VMkernel network connectivity can be another problem area resulting in timeouts or outright failed migrations. When testing network connectivity between hosts, you will want to test from a VMkernel perspective more so if your hosts are running multiple VMkernel adapters.
There are a couple of ESXi shell commands that help you do this. These are vmkping and nc. Vmkping uses a VMkernel’s TCP/IP stack to send ICMP traffic to a destination host as opposed to using the host’s physical interface TCP/IP stack which is something a normal ping utility would do.
In my environment, I’ve set all my hosts to use vmk0 for vMotion. As per the next example, I am verifying that host 192.168.16.69 is reachable from all the VMkernels adapters on the ESXi host where I’m running the command from. The -I parameter is used to tell vmkping which VMkernel should be used to test network connectivity.
Additionally, I’ve used the nc (netcat) command to ensure that the source host can connect to the vMotion network port (8000) on the destination host.
4 – Dismount unused ISOs
If a virtual machine has a mounted ISO image residing on storage not accessible by the ESXi host where you want the VM migrated to, vMotion will fail with the following remote backing (1003780) error.
This is another common issue and an easy one to fix. Simply unmount the image from the CD/DVD drive from the VM’s settings as shown next.
Alternatively, untick the Connected option next to the CD/DVD device and answer Yes to the eject question.
5 – Is your time in sync?
Ensure that the clocks on your ESXi hosts are kept in sync by specifying a common NTP source. Time drift per se does not influence vMotion, however, there’s a slight issue. When a VM is migrated to a host with an out-of-sync clock, the VM’s guest OS clock will adjust accordingly. This means that if the time on ESXi is off, so will be that on your VM. Not a good thing if you’re migrating something like an AD domain controller which acts a PDC for the whole domain.
You are probably familiar with the Synchronize guest time with host VMware Tools option. What you may not know is that, independently of the option being disabled, there are instances where a VM will still synchronize its clock to that on the ESXi host. Some instances that trigger this behavior include snapshots, restarting vmtools and, in our case, vMotioning a VM.
To completely disable time synchronization at the VM level, the lines listed next have to be included in the VM’s configuration file as per this KB article.
Option | Value |
tools.syncTime | 0 |
time.synchronize.continue | 0 |
time.synchronize.restore | 0 |
time.synchronize.resume.disk | 0 |
time.synchronize.shrink | 0 |
time.synchronize.tools.startup | 0 |
time.synchronize.tools.enable | 0 |
time.synchronize.resume.host | 0 |
I use the following one-liner PowerCLI to quickly retrieve the time setting on all the hosts managed by specific vCenter instances.
foreach ($esx in (get-vmhost)) {$esx.Name + " -> " + (get-view $esx.ExtensionData.ConfigManager.DateTimeSystem).QueryDateTime().ToLocalTime()}
The following links describe the symptoms and possible solutions to other common and not so common vMotion issues.
- VMware vMotion fails if target host does not meet reservation requirements
- Investigating disk space on an ESX or ESXi host
- vMotion fails at 10% with the error: Operation timed out
- vMotion fails at 14% with the error: Timed out waiting for migration start request
- Performing vMotion fails at 14% despite vmkping succeeding from source to target IP address
- vMotion fails at 90% with the error: A general system error occurred: failed to resume on destination message
Conclusion
We’ve seen how a few tips and checks go a long way to ensure optimal vMotion functionality as well and prevent some common vMotion troubleshooting issues you might bump into. In vSphere Networking Basics Part 1 and Part 2 posts, I go into some depth on how to set up VMkernels and how service-specific TCP/IP stacks can improve performance, so do have a look before you leave.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!