VMware Troubleshooting

Troubleshooting complex virtualization technology is something that system administrators have to face at some point, and it’s not always easy to get things up and running again. This webinar covers the most common problems experienced in VMware vSphere.



Get the Slide Deck

I've uploaded the slide deck and it can be downloaded HERE.

Webinar Q&A

Q: When viewing events live in web client, I’m surprised how limited in duration / detail. Is there a default duration of these live event logs? Is a longer history available in non-online log files?

It’s really meant to keep your environment moving smoothly. If vCenter stored and displayed all historical logs and events it would get quite sluggish.

This is a question that has came up in the VMware communities as well. You can look for older events using PowerCLI easily. See: https://communities.vmware.com/message/1984255#1984255

Q: When getting All Paths Down (APD) errors, I’ve been unable to convince the host to reconnect to the recovered storage without host reboot. Do you have a better way to reconnect a datastore that had been APD?

Before doing a host reboot, you should try to re-scan your storage. You can do that under your storage adapter on the ESXi host in question. A reboot will also re-scan your storage but a re-scan can be done without the host reboot.

Q: Is the Ruby vSphere console only available on vMA?

No, the Ruby vSphere Console comes bundled with both the vCenter Server Appliance (VCSA) and the Windows version of vCenter Server.

Q: How do you setup vSAN nested with magnetic/SSD combination or either one for a vSAN lab?

Because you said lab, I will focus on the lab aspect of the question. If you’re doing this in a production environment always follow the deployment guides. As far as a lab goes, you will likely have to go into your disks and mark the disk as either an SSD or HDD. One of my favorite vSAN labs is the one that William Lam has. He wrote a fantastic script that can deploy vCenter, ESXi hosts and vSAN all nested. Check it out here: http://www.virtuallyghetto.com/2016/11/vghetto-automated-vsphere-lab-deployment-for-vsphere-6-0u2-vsphere-6-5.html

Q: Any default value for %RDY, %CSTP, and %used?

There are no default values per-say, but general troubleshooting guidelines.

CPU Ready is measured in milliseconds (ms). VMware’s best practice guidelines indicate it is best to keep your VMs below 5% CPU Ready per vCPU. The lower %RDY metric, the better and the generally “acceptable” threshold is around 10% but aim for 5% or less.

In most cases a high %CSTP value is a good indicator that the number of vCPUs on a VM has been over-provisioned.  In some cases a high %CSTP value can also be seen during VM snapshot activities.

esxtop takes snapshots. Snapshots are every 5 seconds by default. To display a metric it takes two snapshots and makes a comparison of the two to display the difference. The lowest snapshot value is 2 seconds. Metrics such as %USED and %RUN show the CPU occupancy delta between successive snapshots. Those will vary in each environment.

Q: I recently started getting an error in my VM backup solution that it cannot quiesce the virtual machine.  I am receiving the same error with both Altaro and I tried with Acronis Backup for VMware and it gives the same error. I am going to assume the issue is with the VM itself. The VM is on a host that has 3 other VMs. Only this one cannot be backed up. Ive tried a lot of troubleshooting but still does not work.

Verify VMware Tools is up to date and the service is running with no errors, typically when a failure like that occurs I always check VMware Tools first and that resolves it.

Q: When using virtual backup solutions, at times once comes across consolidation issues due to lock on the delta files, there are various troubleshooting measures over the internet but many are time intensive, do you have any insight on resolving these types of “virtual machine needs consolidation” type errors?

One of the best ways to find out if you have multiple VMs that need consolidation is running the Get-VM cmdlet. Below it will show you which VMs have snapshots and need consolidation.

					Get-VM | Where-Object {$_.Extensiondata.Runtime.ConsolidationNeeded}
					

The line below can consolidate all the VMs that need it.

					Get-VM | Where-Object {$_.Extensiondata.Runtime.ConsolidationNeeded} | foreach {$_.ExtensionData.ConsolidateVMDisks_Task()}
					
Q: What % of companies use VMs?

I’m not sure there are exact numbers on this, but one thing I will always remember is when attending VMworld in 2009, VMware announced that at that time, there were more virtual machines being deployed then vs. physical servers.

More and more companies are moving to a virtual infrastructure as the default. You now have to really justify WHY you need a physical server.

Q: With a home lab; I had a USB boot drive fail and no failover capacity due to hardware limits at the time. I was able to rebuild the vSphere 5.5  hypervisor. I was able to bring it back into vSAN but cannot see the files on the storage from the vSphere client or from ssh on the host. Is there any way to see the raw storage?

You might try to check to see if you can see storage from the Ruby vSphere Console. See: https://blogs.vmware.com/vsphere/2014/08/managing-virtual-san-rvc-part-2-navigating-vsphere-virtual-san-infrastructure-rvc.html

Q: Does it matter how large a VM is when taking snapshots? Are there size limits/restrictions or at the very least guidlines when taking snapshots? We use altaro for backups in our environment and it utilizes snapshots but I noticed on one of our larger VM’s (around 1.5TB in disk) it will fail in its backup process and require a disk consolidation task.

The maximums size of a snapshot will vary based on block size, overhead, etc.  See KB 1012384

Here are the VMware best practices on snapshots:

  • VMware recommends only a maximum of 32 snapshots in a chain. However, for a better performance, use only 2 to 3 snapshots.
  • Do not use a single snapshot for more than 24-72 hours.
  • When using a third-party backup software, ensure that snapshots are deleted after a successful backup.
  • Ensure that there are no snapshots before:
    • Performing Storage vMotion in vSphere 4.x and earlier environments. vSphere 5.0 and later support Storage vMotion with snapshots on a virtual machine.
    • Increasing the virtual machine disk size or virtual RDM.
Q: What is the best software to convert from VMware to Hyper-V?

For VMware to Hyper-V, I would use Microsoft Virtual Machine Converter. For VMware to Hyper-V, VMware Converter. Both are free.

Q: Can you put some links on lab hardware to amazon that would work (motherboards or bare bone units)?

This is a question I get asked often. I have actually outlined a few options on my blog. See the links below!

I will leave a parts list for an Intel NUC lab here:


Parts List:

Q: Are there any lab environments available in cloud?

Yes, VMware has great hands on labs. Typically they only last an hour or two, so be prepared to move quickly. You can always do them multiple times but they won’t pick up where you left off.

You can find them here: http://labs.hol.vmware.com/HOL/catalogs/

Q: Migrating from vCenter Windows 6.5 to Appliance is not possible, or is it?

Not at this time. If you have already upgraded a vCenter on Windows to 6.5 you can’t migrate to the appliance. The migration utility in the vCSA installer only allows Windows vCenter on 5.5 or 6.0 to be migrated.

Q: I have two hosts when configuring HA says I have to do redistricting of the network cards.

I’m not familiar with the term redistricting, at least from a VMware standpoint. In general with a HA network it is best practice to maintain two separate management networks for redundancy so, you really want to split them up. You can use a vLAN but it’s best to keep them on a different set of switches if possible for the best redundancy.

Q: What is the best way of backing up vCenter on Windows?

On a Windows based vCenter, it is easiest to use a product like Altaro VM Backup to backup your vCenter on a schedule.

Q: Is Bus Sharing required for Physical RDM LUN on VM?

You have two different types of bus sharing with VMware. Physical compatibility mode and virtual compatibility mode. VMware outlines the differences in KB 2009226.

  • Physical mode is useful while running SAN management agents or other SCSI target-based software in the virtual machine.
  • Physical mode also allows virtual-to-physical clustering for cost-effective high availability.
  • Virtual Machine Snapshots are not available when the RDM is used in physical compatibility mode.
  • You can use this mode for Physical-to-virtual clustering and cluster-across-boxes.
  • Virtual mode is more portable across storage hardware than physical mode, presenting the same behavior as a virtual disk file.
  • You can use virtual mode for both Cluster-in-a-box and cluster-across-boxes.

Meet the Speakers

Andy Syrewicze

@asyrewicze

Cloud & Data Management MVP

Technical Evangelist - Altaro

Andy is a 15+ year IT pro specializing in Virtualization, Storage, Cloud, and Infrastructure. By day he’s a Technical Evangelist for Altaro, leading technical content and pre-sales. By night he shares his IT knowledge online or over a cold beer. He holds the Microsoft MVP award in Cloud and Datacenter Management, and one of few who is also a VMware vExpert.

Ryan Birk

@ryanbirk

VMware vExpert 12-17

VMware Certified Instructor

Ryan has been working in IT for over 12 years. He’s been a System Admin, Virtualization Consultant, Engineer, and is now a Technical Instructor. Since 2012, he has been a proud VMware vExpert. He also runs his popular blog ryanbirk.com, which focuses on VMware home lab best practices.