Save to My DOJO
Table of contents
The health of a Microsoft Failover Cluster’s quorum leans most heavily on the state of the nodes. If you’re already using Nagios to monitor individual node states, then you’ll find out very quickly if any of them are down. Sometimes, though, the witness goes offline. If you haven’t got a monitor on that, then you can run into other problems. For instance, you may opt to manually pause a node for maintenance. If the witness is already down, the loss combination might cause the entire cluster to go offline. This article presents a short Nagios detection script for the status of your quorum witness.
This script is useful for any cluster, not just Hyper-V clusters.
If you’re new to Nagios, then you should probably start with the How To: Monitor Hyper-V with Nagios article first. I did publish a follow-up article with a script with some base functions for a cluster, but that script is not required to use this one.
NSClient++ Configuration
These changes are to be made to the NSClient++ files on all Windows nodes that are part of the cluster to be monitored. These instructions do not include configuring NSClient++ to operate PowerShell scripts. Please refer to the aforementioned how-to article for that.
C:Program FilesNSClient++nsclient.ini
If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.
[/settings/external scripts/wrapped scripts] check_clusterquorumwitness = check_clusterquorumwitness.ps1
The NSClient++ service must be restarted after all changes to its ini file.
C:Program FilesNSClient++scriptscheck_clusterquorumwitness.ps1
Create the file with the following contents:
<# check_clusterquorumwitness.ps1 Written by Eric Siron (c) Altaro Software 2017 Version 1.0 January 22, 2017 Intended for use with the NSClient++ module from http://nsclient.org Checks the cluster's quorum status and returns the status to Nagios. #> $QuorumResourceName = [String]::Empty if((Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_Cluster).QuorumTypeValue -in @(2,3)) { $QuorumResourceName = (Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_ClusterToQuorumResource).PartComponent.Name } if($QuorumResourceName) { if((Get-CimInstance -Namespace rootmscluster -ClassName MSCluster_Resource -Filter ('Name="{0}"' -f $QuorumResourceName)).State -ne 2) { Write-Host -Object 'Quorum offline' exit 2 } else { Write-Host -Object 'Quorum online' exit 0 } } Write-Host -Object 'No external witness specified' exit 0
Nagios Configuration
These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.
/usr/local/nagios/etc/objects/commands.cfg
The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there. If you are not working with a Hyper-V system, then you can create any section heading that makes sense to you, or just insert the command wherever you like.
define command{ command_name check-clusterquorumwitness command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_clusterquorumwitness }
/usr/local/nagios/etc/objects/hypervhost.cfg
This file and section were created in the Hyper-V base scripts article. As long as it appears somewhere in one of the activated .cfg files, it will work.
This is a sample! You must use your own cluster name object! If you have multiple clusters to monitor, remember that you can place them into a Nagios hostgroup. You can then apply this service to the group rather than the individual cluster name objects. Do not assign the service to the nodes! The monitor will still work, but it’s inefficient and failures will result in many duplicate notifications.
############################################################################### ############################################################################### # # CLUSTER SERVICE DEFINITIONS # ############################################################################### ############################################################################### # check CLHV1's quorum witness status define service{ use generic-service host_name clhv1 service_description Quorum Witness Status check_command check-clusterquorumwitness }
Nagios must be restarted after these files are modified. Remember to run these separately. Do not just copy/paste! If the first command indicates a validation failure, check your work and fix the problem before restarting the Nagios service!
sudo service nagios checkconfig sudo service nagios restart
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!