Nagios for Hyper-V: Cluster Shared Volume Status

Microsoft failover clustering offers Cluster Shared Volumes (CSVs). These allow all nodes of a cluster to communicate with the same block storage LUN simultaneously. “Redirected Access” enhances this feature by redirecting I/O through the CSV’s owning node when any of the other nodes cannot access it directly. Unfortunately, if you don’t have a monitoring system in place, a CSV could go into Redirected Access mode and you’d never know. The best outcome is a minor performance hit. Depending on your available physical pathways, that performance hit might also impact Live Migration. If only a single node has direct access, then all of the contained roles will fail if that node fails. Of course, you’d also probably like to know if a CSV goes offline completely.

This script allows Nagios to watch a single designated CSV. If it fails completely, a Critical state is set in Nagios. If it is in Maintenance Mode, a Warning state is set in Nagios. My thought process for that condition is that Maintenance Mode is usually intentional, but you don’t want it left there for an extended period of time. You configure the response level to a Redirected Access state. If you’re using a guest cluster in 2012 R2 with a shared VHDX, then the CSV will always be in Redirected Access mode, so that would be a normal condition for you.

This script is useful for any cluster that uses CSVs (for example, SOFS and SQL), not just Hyper-V clusters.

If you’re new to Nagios, then you should probably start with the How To: Monitor Hyper-V with Nagios article first. I did publish a follow-up article with a script with some base functions for Hyper-V, but that script is not required to use this one. The base script for clusters is required. It’s linked below.

NSClient++ Configuration

These changes are to be made to the NSClient++ files on all Windows nodes that are part of the cluster to be monitored. These instructions do not include configuring NSClient++ to operate PowerShell scripts. Please refer to the aforementioned how-to article for that.


C:Program FilesNSClient++nsclient.ini

If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.

[/settings/external scripts/wrapped scripts]
check_csvstatus=check_csvstatus.ps1 $ARG1$ $ARG2$

C:Program FilesNSClient++scriptscheck_csvstatus.ps1

The required script clusterbase.ps1 must exist in the same folder. This script was written against version 1.1 of that script and will check for it.

<#
	check_csvstatus.ps1
	Written by Eric Siron
	(c) Altaro Software 2017

	Version 1.1 November 17, 2017

	Intended for use with the NSClient++ module from http://nsclient.org
	Checks a Cluster Shared Volume and returns the status to Nagios.

	# for $RedirectedAccessHandleMode, specify 0 to ignore, 1 to treat as a warning, 2 to treat as critical
#>

param(
	[Parameter(Position=1)][String]$CSVName,
	[Parameter(Position=2)][UInt16]$RedirectedAccessHandleMode = 1
)

begin {
	$RequiredClusterBaseVersion = 1.1
}

process {
	if([String]::IsNullOrEmpty($CSVName))
	{
		Write-Host -Object 'No CSV was specified'
		Exit 3
	}

	$ClusterBase = Join-Path -Path $PSScriptRoot -ChildPath 'clusterbase.ps1'
	. $ClusterBase

	$ClusterBaseVersion = Get-ANClusterBaseVersion
	if($ClusterBaseVersion -lt $RequiredClusterBaseVersion)
	{
		Write-Host -Object ('clusterbase.ps1 must be at least version {0} to use this script (found version: {1})' -f $RequiredClusterBaseVersion, $ClusterBaseVersion)
		Exit 3
	}

	$CSVPartition = Get-ANCSVFromCSVName -CSVName $CSVName
	switch($CSVPartition.FaultState)
	{
		0 {
			Write-Host -Object 'Normal operation'
			Exit 0
		}
		1 {
			Write-Host -Object 'Redirected Access'
			Exit $RedirectedAccessHandleMode
		}
		2 {
			Write-Host -Object 'No Access'
			Exit 2
		}
		3 {
			Write-Host -Object 'Maintenance Mode'
			Exit 1
		}
		default {
			Write-Host -Object ('Unable to detect the status of CSV "{0}"' -f $CSVName)
			Exit 3
		}
	}
}

Nagios Configuration

These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.

/usr/local/nagios/etc/objects/commands.cfg

The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there. If you are not working with a Hyper-V system, then you can create any section heading that makes sense to you, or just insert the command wherever you like.

define command{
	command_name	check-csvstatus
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_csvstatus -a $ARG1$ $ARG2$
}

/usr/local/nagios/etc/objects/hypervhost.cfg

This file and section were created in the Hyper-V base scripts article. As long as it appears somewhere in one of the activated .cfg files, it will work.

This is a sample! You must use your own cluster name object and CSV name!

The parts you want to set are:

  • For the host_name, enter the cluster name object of the cluster that hosts the CSV. Mine is called “clhv1”.
  • For the service_description, use whatever makes sense to you. This is what appears in the Nagios web interface and in any alert e-mails.
  • For the check_command, use the format check-csvstatus!csvname!#

The number at the end of the check_command line specifies how you want to treat the CSV if it is in Redirected Access mode. In the following sample, I used a 2. Values are:

  • 0: a Redirected Access status will be noted but ignored. Use this for CSVs with guest clusters using shared VHDX on 2012 R2
  • 1: a Redirected Access status will set a Warning condition in Nagios; this is the default in the script, although I didn’t test how Nagios/NSClient++ cope with a parameter that isn’t specified
  • 2: a Redirected Access status will set a Critical condition in Nagios
###############################################################################
###############################################################################
#
# CLUSTER SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

# check status of CSV1 on CLHV1
define service {
	use			generic-service
	host_name		clhv1
	service_description	CSV1 Status
	check_command		check-csvstatus!CSV1!2
}

Nagios must be restarted after these files are modified.

sudo service nagios checkconfig
sudo service nagios restart
Altaro Hyper-V Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

5 thoughts on "Nagios for Hyper-V: Cluster Shared Volume Status"

  • Sanjay Yadav says:

    Hello Eric,

    thank you for the nice article.I have 8 node hyper v Cluster and have already installed the Nsclient plugin and can easily read the SNMP values of the servers, like ram, cpu and Uptime,

    when i follow your article to monitor the status of CSV volumes, it gives me this error:
    CHECK_NRPE: Error – Could not complete SSL handshake.

    can you help me what can be the cause ?

Leave a comment or ask a question

Your email address will not be published. Required fields are marked *

Your email address will not be published. Required fields are marked *

Notify me of follow-up replies via email

Yes, I would like to receive new blog posts by email

What is the color of grass?

Please note: If you’re not already a member on the Dojo Forums you will create a new account and receive an activation email.