Nagios for Hyper-V: Monitoring the Oldest Checkpoint Age

Table of contents

The script in this article will scan a Hyper-V host to find its oldest checkpoint. It is an active check tied to the host, not to any particular virtual machine. In order to use it, you must have a functioning Nagios environment and NSClient++ operating as configured in our main Nagios article. It does not directly require any of the base scripts, but the sections mentioned in that article are used here.

Updated May 2, 2018: Version 2.0

  • Using the CIM cmdlets instead of WMI cmdlets for speed
  • Improved performance by reducing number of CIM calls
  • The checkpoint report properly identifies the owning virtual machine
  • Ignores checkpoints created by a pooled VDI collection

NSClient++ Configuration

These changes are to be made to the NSClient++ files on all Hyper-V hosts to be monitored.

C:\Program Files\NSClient++\nsclient.ini

If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.

[/settings/external scripts/wrapped scripts]
check_checkpointage=check_hvcheckpointage.ps1 $ARG1$ $ARG2$

C:\Program Files\NSClient++\scripts\check_hvcheckpointage.ps1

This script scans a Hyper-V host for its oldest existing checkpoint and reports back to Nagios. This file does not exist and must be created.

<#
	check_hvcheckpointage.ps1
	Written by Eric Siron
	(c) Altaro Software 2018

	Version 2.0 May 2, 2018

	Intended for use with the NSClient++ module from http://nsclient.org
	Checks a Hyper-V host for its oldest checkpoint and returns the status to Nagios.
#>
param(
	[Parameter(Position=1)][String]$WarningLevel = '2d',
	[Parameter(Position=2)][String]$CriticalLevel = '3d'
)

Set-Variable -Name OldestCheckpoint
if($WarningLevel -match '[mhdwMHDW]')
{
	$WarnMeasurement = $Matches[0][0]
	if($WarningLevel -match '\d*')
	{
		$WarnLength = $Matches[0]
	}
}

if($CriticalLevel -match '[mhdwMHDW]')
{
	$CriticalMeasurement = $Matches[0][0]
	if($CriticalLevel -match '\d*')
	{
		$CriticalLength = $Matches[0]
	}
}

$OldestCheckpointCreationTime = [DateTime]::Now
$RawCheckpointIDs = Get-CimInstance -Namespace root/virtualization/v2 -Property Dependent -Class Msvm_SnapshotOfVirtualSystem
foreach ($RawCheckpointID in $RawCheckpointIDs)
{
	$Checkpoints = Get-CimInstance -Namespace root/virtualization/v2 -Property VirtualSystemIdentifier, CreationTime, ElementName -Class Msvm_VirtualSystemSettingData -Filter ('InstanceID="{0}" AND VirtualSystemType="Microsoft:Hyper-V:Snapshot:Realized" AND NOT ElementName LIKE "%RDV_ROLLBACK%"' -f $RawCheckpointID.Dependent.InstanceID)
	foreach($Checkpoint in $Checkpoints)
	{
		$CheckpointCreationDate = $Checkpoint.CreationTime
		if($CheckpointCreationDate -lt $OldestCheckpointCreationTime)
		{
			$VM = Get-CimInstance -Namespace root/virtualization/v2 -Property ElementName -Class Msvm_ComputerSystem -Filter ('Name="{0}"' -f $Checkpoint.VirtualSystemIdentifier)
			$OldestCheckpoint = @($Checkpoint.ElementName, $VM.ElementName, $CheckpointCreationDate)
		}
	}
}
if($OldestCheckpoint)
{
	[TimeSpan]$CheckpointAge = [DateTime]::Now - $OldestCheckpoint[2]
	$AgeString = '{0} minutes' -f $CheckpointAge.Minutes
	if($CheckpointAge.Hours)
	{
		$AgeString = '{0} hours, {1}' -f $CheckpointAge.Hours, $AgeString
	}
	if($CheckpointAge.Days)
	{
		$AgeString = '{0} days, {1}' -f $CheckpointAge.Days, $AgeString
	}

	Write-Host ('Checkpoint "{0}" for VM "{1}" is {2} old. Created: {3}.' -f $OldestCheckpoint[0], $OldestCheckpoint[1], $AgeString, $OldestCheckpoint[2])
	$ComparisonLength = 0
	switch($CriticalMeasurement)
	{
		'm' {
			$ComparisonLength = $CheckPointAge.Minutes
		}
		'h' {
			$ComparisonLength = $CheckpointAge.Hours
		}
		'd' {
			$ComparisonLength = $CheckpointAge.Days
		}
		default {
			$ComparisonLength = $CheckpointAge.Days * 7
		}
	}
	if($ComparisonLength -gt $CriticalLength)
	{
		Exit 2
	}
	$ComparisonLength = 0
	switch($WarnMeasurement)
	{
		'm' {
			$ComparisonLength = $CheckPointAge.Minutes
		}
		'h' {
			$ComparisonLength = $CheckpointAge.Hours
		}
		'd' {
			$ComparisonLength = $CheckpointAge.Days
		}
		default {
			$ComparisonLength = $CheckpointAge.Days * 7
		}
	}
	if($ComparisonLength -gt $WarnLength)
	{
		Exit 1
	}
	Exit 0
}
else
{
	Write-Host 'No checkpoints'
	exit 0
}

Restart the NSClient++ service.

Nagios Configuration

These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.

/usr/local/nagios/etc/objects/commands.cfg

The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there.

################################################################################
#
# Hyper-V Host Commands
#
################################################################################

# $ARG1$: age that triggers a warning condition. use one letter (m = minute, h = hour, d = day, w = week) and one number. ex: 3d for 3 days. order does not matter
# $ARG2$ age that triggers a critical condition
define command{
	command_name	check-checkpoint-age
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -p 5666 -c check_checkpointage -a $ARG1$ $ARG2$
}

/usr/local/nagios/etc/objects/hypervhost.cfg

This file and section were created in the required base scripts article.

###############################################################################
###############################################################################
#
# HYPER-V SERVICE DEFINITIONS
#
###############################################################################
###############################################################################

# check hosts individually for oldest checkpoint
define service{
	use			generic-service
	hostgroup_name		hyper-v-servers
	service_description	All VMs: Max Checkpoint Age
	check_command		check-checkpoint-age!2h!3d
}

As shown, each host in “hyper-v-servers” will be checked at the default interval. If a checkpoint is older than 3 days, it will trigger a Critical alert. If a checkpoint is older than 2 hours, it will trigger a warning. You can modify the above as needed. You can also duplicate this service but apply it to specific a specific “hostname” instead of “hostgroup_name” to set per-host warning and critical levels.

You must restart Nagios to apply this configuration.

sudo service nagios checkconfig
sudo service nagios restart

 

Altaro Hyper-V Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!