• About Altaro
  • About Altaro VM Backup
  • 101 Free VMware Tools
  • facebook
  • twitter
  • google+
  • linkedin
  • rss
  • About Altaro
  • About Altaro VM Backup
  • 101 Free VMware Tools
vSphere
Altaro.com » Altaro's VMware Hub and blog » vSphere » 5 Awesome Tips to Troubleshoot vMotion
Jason Fenech
by Jason Fenech in vSphere
Tags: ESXi, Howto, vCenter, vCenter Server Appliance, vMotion, VMware, vSphere 6.5
5 tips when troubleshooting vmotion

5 Awesome Tips to Troubleshoot vMotion

09 Oct 2017 by Jason Fenech
0 vSphere
 

 

 

 

 

In Why IT Admins Adore vMotion, I described the main types of vSphere live migrations; vMotion, Storage vMotion and Enhanced vMotion. In this post, I’ll go over 5 tips to help you troubleshoot vMotion issues on the fly.

 

How does vMotion work?


A vMotion event can be initiated by a user, by vCenter Server itself or via an API call. There are 8 stages to a vMotion event as illustrated in Fig. 1.

Figure 1 - A vMotion task lifecycle
Figure 1 – A vMotion task lifecycle

 

Although the above steps are self-explanatory, there’s much more going under the hood. Let’s just say that vMotion is a marvelous piece of engineering. Here’s a video by VMware explaining vMotion and how to use it.

 

How to fix and prevent vMotion issues


If your environment is not adequately prepared for vMotion, you could come across issues at any stage of a vMotion task. To being with, revisit the requirements and make sure your environment is compliant

This KB article provides a complete list. I picked a salient few as described next.

 

1 – Is the vMotion VMkernel option enabled?

Every vMotion capable host must by definition have the vMotion option enabled for at least one VMkernel adapter otherwise a VM will simply fail to migrate to any host where the option has not been enabled.

In the following example (Fig. 3), I disabled vMotion from the VMkernel on one of my hosts. If I try to migrate a VM to this host, the vMotion task immediately fails.

Figure 2 - The error returned when a VM is migrated to a host not set up for vMotion
Figure 2 – The error returned when a VM is migrated to a host that is not set up for vMotion

 

Of course, this is just for illustrative purposes but given large environments, you can easily overlook enabling the setting on one or more hosts unless you’re automating or monitoring the process.

vMotion troubleshoot
Figure 3 – Enabling the vMotion option on a VMkernel adapter

 

2 – Check your advanced settings

Another common error you might come across is the infamous vMotion fails at 10% (1013150). Although you’re more likely to see this on older versions of ESXi / vCenter Server, I did come across it a couple of times when using vSphere 6.5. Regardless, if you’re experiencing vMotioning issues, make sure that the migrate.enabled advanced setting is set to 1 on all ESXi hosts.

Backup software will sometimes set this value to 0 (disabled) to ensure that backup jobs complete successfully by preventing a VM that is being backed up from vMotioning to another host. As it sometimes happens, an unplanned network or storage outage may prevent the Backup software from rolling the setting back to its original value leaving the VM in vMotion limbo.

Figure 4 - Ensuring the migrate.enabled advanced setting is set to 1
Figure 4 – Ensuring the migrate.enabled advanced setting is set to 1

 

3 – Run network diagnostics at the VMkernel level

VMkernel network connectivity can be another problem area resulting in timeouts or outright failed migrations. When testing network connectivity between hosts, you will want to test from a VMkernel perspective more so if your hosts are running multiple VMkernel adapters.

There are a couple of ESXi shell commands that help you do this. These are vmkping and nc. Vmkping uses a VMkernel’s TCP/IP stack to send ICMP traffic to a destination host as opposed to using the host’s physical interface TCP/IP stack which is something a normal ping utility would do.

In my environment, I’ve set all my hosts to use vmk0 for vMotion. As per the next example, I am verifying that host 192.168.16.69 is reachable from all the VMkernels adapters on the ESXi host where I’m running the command from. The -I parameter is used to tell vmkping which VMkernel should be used to test network connectivity.

Additionally, I’ve used the nc (netcat) command to ensure that the source host can connect to the vMotion network port (8000) on the destination host.

Figure 5 - Performing network diagnostics using vmkping and nc from ESXi's shell
Figure 5 – Performing network diagnostics using vmkping and nc from ESXi’s shell

 

4 – Dismount unused ISOs

If a virtual machine has a mounted ISO image residing on storage not accessible by the ESXi host where you want the VM migrated to, vMotion will fail with the following remote backing (1003780) error.

Figure 6 - The error returned when you try and migrate a VM with a mounted ISO image
Figure 6 – The error returned when you try and migrate a VM with a mounted ISO image

 

This is another common issue and an easy one to fix. Simply unmount the image from the CD/DVD drive from the VM’s settings as shown next.

Figure 7 - Unmounting an ISO image from a VM to allow it to migrate
Figure 7 – Unmounting an ISO image from a VM to allow it to migrate

 

Alternatively, untick the Connected option next to the CD/DVD device and answer Yes to the eject question.

Figure 8 - Confirming a user initiated media disconnect
Figure 8 – Confirming a user initiated media disconnect

 

5 – Is your time in sync?

Ensure that the clocks on your ESXi hosts are kept in sync by specifying a common NTP source. Time drift per se does not influence vMotion, however, there’s a slight issue. When a VM is migrated to a host with an out-of-sync clock, the VM’s guest OS clock will adjust accordingly. This means that if the time on ESXi is off, so will be that on your VM. Not a good thing if you’re migrating something like an AD domain controller which acts a PDC for the whole domain.

You are probably familiar with the Synchronize guest time with host VMware Tools option. What you may not know is that, independently of the option being disabled, there are instances where a VM will still synchronize its clock to that on the ESXi host. Some instances that trigger this behavior include snapshots, restarting vmtools and, in our case, vMotioning a VM.

To completely disable time synchronization at the VM level, the lines listed next have to be included in the VM’s configuration file as per this KB article.

Option Value
tools.syncTime 0
time.synchronize.continue 0
time.synchronize.restore 0
time.synchronize.resume.disk 0
time.synchronize.shrink 0
time.synchronize.tools.startup 0
time.synchronize.tools.enable 0
time.synchronize.resume.host 0

 

I use the following one-liner PowerCLI to quickly retrieve the time setting on all the hosts managed by specific vCenter instances.

PowerShell
1
foreach ($esx in (get-vmhost)) {$esx.Name + " -> " + (get-view $esx.ExtensionData.ConfigManager.DateTimeSystem).QueryDateTime().ToLocalTime()}

Figure 10 - A PowerCLI command that retrieves the time setting on vCenter managed ESXi hosts
Figure 10 – A PowerCLI command that retrieves the time setting on vCenter managed ESXi hosts

 

The following links describe the symptoms and possible solutions to other common and not so common vMotion issues.

  • VMware vMotion fails if target host does not meet reservation requirements
  • Investigating disk space on an ESX or ESXi host
  • vMotion fails at 10% with the error: Operation timed out
  • vMotion fails at 14% with the error: Timed out waiting for migration start request
  • Performing vMotion fails at 14% despite vmkping succeeding from source to target IP address
  • vMotion fails at 90% with the error: A general system error occurred: failed to resume on destination message

 

Conclusion


We’ve seen how a few tips and checks go a long way to ensure optimal vMotion functionality as well and prevent some common vMotion troubleshooting issues you might bump into. In vSphere Networking Basics Part 1 and Part 2 posts, I go into some depth on how to set up VMkernels and how service-specific TCP/IP stacks can improve performance, so do have a look before you leave.

Jason Fenech
Jason Fenech

An IT veteran for over 23 years, I covered various roles throughout my career. Prior to joining Altaro as a blog writer and QA tester, I was employed as an infrastructure engineer at a cloud services provider working exclusively with VMware products. The Altaro VMware blog enables me to share the experience and knowledge gained and, much to my surprise, is what got me the vExpert 2017 award. Besides being a techie and a science buff, I like to travel and play guitars. I also do some photography and love having a go at playing the occasional XBOX game, Halo being my absolute favourite. I am also a proud father of two and parent to a crazy Dachshund called Larry.

All Posts   WEBSITE   EMAIL

Click here to cancel reply.

Have a question or comment? We'd love to hear it! Cancel reply

Your email address will not be published. Required fields are marked *

XHTML: You can use these tags <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

 

Featured product

Download Altaro VM Backup

Download 30-day trial or Free Edition (free for 2 VMs, forever.)

Site categories

  • vSphere
  • Storage
  • Automation and Management
  • Altaro News
  • Desktop Virtualization
  • HyperConvergence
  • Cloud
  • Uncategorized

Altaro Software

  • About Altaro
  • Altaro VM Backup

Altaro VM Backup

  • Altaro VM Backup
  • Download Free Version
  • Download 30-day Trial

Our writers

  • Jason Fenech Jason Fenech
    142 Posts
  • Andy Syrewicze (Chief Editor) Andy Syrewicze (Chief Editor)
    24 Posts
  • Ryan Birk
    22 Posts
  • Luke Orellana Luke Orellana
    18 Posts

Copyright © 2018 Altaro Software.

  • facebook
  • twitter
  • google+
  • linkedin
  • rss
[contact-form-7 id="4731" title="Act-On subs"]