I’ve put this post together on the spur of the moment after coming across an issue I had not seen in a while, an unresponsive VM in vSphere. Yep, it does happen from time to time. By unresponsive, I mean a guest OS that will categorically ignore any remote call made to it and a VM that is completely oblivious to anything triggered from a vSphere client.
Thankfully, there are a number of methods used to kill the offending VM. The first port of call is PowerCLI. If this fails, you still have a number of options left at your disposal and they all involve running commands directly off an ESXi host.
The alternative to this, of course, is to reboot the ESXi host assuming you are able to migrate VMs elsewhere. Hopefully you will never have to resort to this act of desperation.
Let start off with the easy one.
The PowerCLI Way
Stop-VM is the cmdlet used to kill a VM. It’s actually used to power off a VM but adding the kill parameter will terminate the VM’s corresponding processes on ESXi effectively killing it. In the following example, I’ll be terminating a VM called Centos 7 Web Server.
Stop-VM -kill <VM display name> -Confirm:$false
The ESXi Way
There are three command line methods you can use on ESXi to terminate unresponsive VMs these being esxcli, vim-cmd and plain old Unix style commands, though technically ESXi doesn’t do Unix. You need to enable SSH and/or Shell on ESXi after which you could use something like putty or, if you prefer, work directly from console. The esxtop tool presents itself as a fourth option.
So, if I had to kill an unresponsive VM using esxcli, I would use the vm namespace pretty much like this:
1) Get a list of the VMs hosted on ESXi.
esxcli vm process list
2) Copy the World ID value and run:
esxcli vm process kill --type=soft -w=796791
The command does not provide any feedback unless it fails to locate the VM or you specify an invalid parameter.
The type parameter takes on three values:
- Soft – Allows the VMX process to shutdown gracefully similar to kill-SIGTERM.
- Hard – Immediately kills the VMX process similar to a kill -9 or kill -SIGKILL command.
- Force – Stops the VMX process when either the Soft or Hard options fail.
Vim-cmd is another command line utility, very similar to esxcli, but bearing a somewhat different syntax. Similarly, it can be used to manage VMs as well other resources. Keeping in line with the subject matter, here’s how vim-cmd is used to terminate a VM.
1. List all the VMs on the currently logged on host and note down the vmid (1st column) of the unresponsive VM.
2. Get the power state of the VM using its vmid. This step is optional since you already know that the VM is toast.
vim-cmd vmsvc/power.getstate 36
3. Try the power.shutdown option first.
vim-cmd vmsvc/power.shutdown 36
4. If whatever reason the VM fails to shut down, try using power.off instead.
vim-cmd vmsvc/power.off 36
The following screenshot provides an example of how the above commands are used to kill a VM.
ESXTOP is a great utility which provides information on how ESXi is using system resources. It can be used for troubleshooting, performance statistics and more. Keep this chart handy as it gives you a quick overview on how the tool may be used.
We’re going to use it to kill a hypothetically unresponsive vm. Here’s how.
- Press Shift+v to change the view to virtual machines.
- Press f to display the list of fields followed by c. This will add the Leader World ID (LWID) column to the view. Press any key to return to the main menu.
- Locate the unresponsive VM under the Name column and note down its LWID.
Press k and type in the LWID value at the World to kill (WID) prompt. Press Enter.
Just to be doubly sure, wait for 30 seconds before validating that the VM is no longer listed. If it is, try again failing which you will probably need to reboot the host.
The brute force is with you
This is the crudest method used to terminate a VM I can think of. It’s actually documented on the VMware site albeit slightly differently. A VM is invariably a series of processes running on ESXi. Using the ps command below, I can list the processes associated, say, with a VM called Centos 7 Web Server.
ps | grep "Centos 7 Web Server"
If you look closely, you’ll see that the value in the second column is identical to all processes. This is the VMX Cartel ID value which, although different from the World ID value, can still be targeted to terminate a VM as follows:
If the processes persist in running, try kill -9 <process id>.
kill -9 797300
Disclaimer: Use this at your peril. It seems to work on any version of ESXi. I’ve actually used it on ESXi 6.5 while working on this post without any apparent side-effects. The main concern I have with this method, is that you risk bringing ESXi down to its knees if you inadvertently kill a wrong process.
There are times when you’ll end up with unresponsive VMs and a couple of things you could try is to use PowerCLI or any of the ESXi command line tools available to kill the buggers. When everything else fails, your best option would be to migrate well behaved VMs to some other ESXi host and reboot the one hosting the unresponsive VMs. There is a chance that you’ll need to power off the host if the frozen VMs are preventing you from putting the ESXi into maintenance mode. If you still have issues after this I suggest giving VMware support a call!
[the_ad id=”4738″][the_ad id=”4796″]
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!