Nagios for Hyper-V: Alert on Failed Quorum

The health of a Microsoft Failover Cluster’s quorum leans most heavily on the state of the nodes. If you’re already using Nagios to monitor individual node states, then you’ll find out very quickly if any of them are down. Sometimes, though, the witness goes offline. If you haven’t got a monitor on that, then you can run into other problems. For instance, you may opt to manually pause a node for maintenance. If the witness is already down, the loss combination might cause the entire cluster to go offline. This article presents a short Nagios detection script for the status of your quorum witness.

This script is useful for any cluster, not just Hyper-V clusters.

If you’re new to Nagios, then you should probably start with the How To: Monitor Hyper-V with Nagios article first. I did publish a follow-up article with a script with some base functions for a cluster, but that script is not required to use this one.

NSClient++ Configuration

These changes are to be made to the NSClient++ files on all Windows nodes that are part of the cluster to be monitored. These instructions do not include configuring NSClient++ to operate PowerShell scripts. Please refer to the aforementioned how-to article for that.


C:Program FilesNSClient++nsclient.ini

If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.

[/settings/external scripts/wrapped scripts]
check_clusterquorumwitness = check_clusterquorumwitness.ps1

The NSClient++ service must be restarted after all changes to its ini file.


C:Program FilesNSClient++scriptscheck_clusterquorumwitness.ps1

Create the file with the following contents:

<#
	check_clusterquorumwitness.ps1
	Written by Eric Siron
	(c) Altaro Software 2017

	Version 1.0 January 22, 2017

	Intended for use with the NSClient++ module from http://nsclient.org
	Checks the cluster's quorum status and returns the status to Nagios.
#>

$QuorumResourceName = [String]::Empty

if((Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_Cluster).QuorumTypeValue -in @(2,3))
{
	$QuorumResourceName = (Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_ClusterToQuorumResource).PartComponent.Name	
}
if($QuorumResourceName)
{
	if((Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_Resource  -Filter ('Name="{0}"' -f $QuorumResourceName)).State -ne 2)
	{
		Write-Host -Object 'Quorum offline'
		exit 2
	}
	else
	{
		Write-Host -Object 'Quorum online'
		exit 0
	}
}

Write-Host -Object 'No external witness specified'
exit 0

Nagios Configuration

These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.


/usr/local/nagios/etc/objects/commands.cfg

The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there. If you are not working with a Hyper-V system, then you can create any section heading that makes sense to you, or just insert the command wherever you like.

define command{
	command_name	check-clusterquorumwitness
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_clusterquorumwitness
}

/usr/local/nagios/etc/objects/hypervhost.cfg

This file and section were created in the Hyper-V base scripts article. As long as it appears somewhere in one of the activated .cfg files, it will work.

This is a sample! You must use your own cluster name object! If you have multiple clusters to monitor, remember that you can place them into a Nagios hostgroup. You can then apply this service to the group rather than the individual cluster name objects. Do not assign the service to the nodes! The monitor will still work, but it’s inefficient and failures will result in many duplicate notifications.

###############################################################################
###############################################################################
#
# CLUSTER SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

# check CLHV1's quorum witness status
define service{
	use			generic-service
	host_name		clhv1
	service_description	Quorum Witness Status
	check_command		check-clusterquorumwitness
}

Nagios must be restarted after these files are modified. Remember to run these separately. Do not just copy/paste! If the first command indicates a validation failure, check your work and fix the problem before restarting the Nagios service!

sudo service nagios checkconfig
sudo service nagios restart
Altaro Hyper-V Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment or ask a question

Your email address will not be published. Required fields are marked *

Your email address will not be published. Required fields are marked *

Notify me of follow-up replies via email

Yes, I would like to receive new blog posts by email

What is the color of grass?

Please note: If you’re not already a member on the Dojo Forums you will create a new account and receive an activation email.