How to Securely Monitor Hyper-V with Nagios and NSClient

How to Securely Monitor Hyper-V with Nagios and NSClient

 

I’ve provided some articles on monitoring Hyper-V using Nagios. In all of them, I’ve specifically avoided the topic of securing the communications chain. On the one hand, I figure that we’re only working with monitoring data; we’re not passing credit card numbers or medical history.

On the other hand, several of my sensors use parameterized scripts. If I didn’t design my scripts well, then perhaps someone could use them as an attack vector. Rather than pretend that I can ever be certain that I’ll never make a mistake like that, I can bundle the communications into an encrypted channel. Even if you’re not worried about the scripts, you can still enjoy some positive side effects.

What’s in this Article

The end effects of following this article through to completion:

  • You will access your Nagios web interface at an https:// address.
  • You can access your Nagios web interface using your Active Directory domain credentials. You can also allow others to use their credentials. You can control what any given account can access.
  • Nagios will perform NRPE checks against your Hyper-V and Windows Server systems using a secured channel

As you get started, be advised that this is a very long article. I did test every single line, usually multiple times. You will almost always need to use sudo. I tried to add it everywhere it was appropriate, but each time I proofread this article, I find another that I missed. You might want to just sudo -s right in the beginning and be done with it.

General Philosophy

When I started working on this article, I fully intended to utilize a Microsoft-based certificate authority. Conceptually, PKI (public key infrastructure) is extremely simple. But, from that simplicity, implementers run off and make things as complicated as possible. I have not encountered any worse offender than Microsoft. After several days struggling against it, I ran up against problems that I simply couldn’t sort out. After trying to decipher one too many of Microsoft’s cryptic and non-actionable errors (“Error 0x450a05a1: masking tape will not eat pearl soufflé from the file that cannot be named”), I finally gave up. So, while it should be possible to use a Microsoft CA for everything that you see here, I cannot demonstrate it. Be aware that Microsoft’s tools tend to output DER (binary) certificates. Choose the Base64 option when you can. You can convert DER to Base64 (PEM).

Rather than giving up entirely, I re-centered myself and adopted the following positions:

  • Most of my readers probably don’t want to go to the hassle of configuring a Microsoft CA anyway; many of you don’t have the extra Windows Server licenses for that sort of thing, either
  • We’re securing monitoring traffic, not state secrets. We can shift the bulk of the security responsibility to the endpoints
  • To make all of this more secure, one simply needs to use a more secure CA. The remainder of the directions stay the same

Some things could be done differently. In a few places, I’m fairly certain that I worked harder than necessary (i.e., could have used fewer arguments to openssl). Functionality and outcome were most important.

In general, certificates are used to guarantee the identity of hosts. Anything else — host names, IP addresses, MAC addresses, etc., can be easily spoofed. In this case, we’re locking down a monitoring system. If someone manages to fool Nagios… uh… OK then. I am more concerned that, if you use the scripts that I provide, we are transmitting PowerShell commands to a service running with administrative privileges, and the transmission is sent in clear text. There are many safeguards to prevent that from being a security risk, but I want to add layers on top of that. So, while host authentication is always a good thing, my primary goal in this case is to encrypt the traffic. It’s on you to take precautions to lock down your endpoints. Maintain a good root password on your Linux boxes, maintain solid password protection policies, etc.

I borrowed heavily from many sources, but probably none quite so strongly as Nagios’ own documentation: https://support.nagios.com/kb/article.php?id=519.

Prerequisites

You need a Linux machine running Nagios. I wrote one guide for doing that on Ubuntu. I wrote another guide for doing that on CentOS. I have a third article forthcoming for doing the same with OpenSUSE. It’s totally acceptable for you to bring your own. The distributions aren’t so radically different that you won’t be able to figure out any differences that survive this article.

Also, for any of this to make sense, you need at least one Windows Server/Hyper-V Server system to monitor.

Have Patience! I can go on and on all day about how Microsoft makes a point of avoiding actionable error messages. In this case, they are far from alone. I lost many hours trying to decipher useless messages from openssl and NRPE. Solutions are usually simple, but the problems are almost always frustrating because the authors of these tools couldn’t be bothered to employ useful error handling. NSClient++ treated me much better, but even that let me down a few times. Take your time and remember that, even though there are an absurd number of configuration points, certificate exchange is fundamentally simple. Whatever problem you encounter is probably a small one.

Step 1. Acquire and Enable openssl

Every distribution that I used already had openssl installed. Just to be sure, use your distribution’s package manager. Examples:

You’ll probably get a message that you already have the package. Good!

Next, we need a basic configuration. You should automatically get one along with the default installation of openssl. Look for a file named “openssl.cnf”. It will probably be in in /etc/ssl or /usr/lib/ssl. Linux can help you:

If you haven’t got one, then maybe removing and reinstalling openssl will create it… I never tried that. You could try this site: https://www.phildev.net/ssl/opensslconf.html. I’ll also provide the one that I used. Different sections of the file are used for different purposes. I’ll show each portion in context.

Set Up Your Directories and Environment

You will need to place your certificate files in a common place. First, look around the location where you found the openssl.cnf file. Specifically, check for “private” and “certs” directories. If they don’t exist, you can make some.

To keep things simple, I just dump everything there on systems that need a directory created. I will write the remainder of this document using that directory. If your system already has the split directories, use “private” to hold key files and “certs” to hold certificate files. Note that if you find these files underneath a “ca” path, that is for the certificate authority, not the client certificates that I’m talking about. I’ll specifically cover the certificate authority in the next section.

Step 2. Set Up a Certificate Authority

In this implementation, the Linux system that runs Nagios will also host a certificate authority. We’ll use that to CA to generate certificates that Nagios and NRPE can use. Some people erroneously refer to those as “self-signed” because they aren’t issued by an “official” CA. However, that’s not the definition of “self-signed”. A self-signed certificate doesn’t have an authority chain. In our case, that term will apply only to the CA’s own certificate, which will then be used to sign other certificates. All of those will be authentic, not-self-signed certificates. As I describe it, you’ll use the same system as for both your CA and Nagios system, but you could just as easily spin up another Linux system to be the CA. You would only need to copy CSR, certificate, and key files across the separate systems as necessary to implement that.

Set Up Your Directories and Environment

You need places to put your CA’s files and certificates. openssl will require its own particular files. If you found some CA folders near your openssl.cnf, use those. Otherwise, you can create your own.

Configure your default openssl.cnf (sometimes openssl.conf). Note the file locations that I mentioned in the previous section. Mine looks like this:

Make certain that you walk through it and enter your localized settings in place of mine. Also note that I’ve removed the comment mark in front of “req_extensions = v3_req”.

Create the CA certificate:

You will first be asked to answer a series of questions. If you filled out the fields correctly, then you can just press [Enter] all the way through them. You will then be asked to provide a password for the private key. Even though we aren’t securing anything of earth-shattering importance, take this seriously.

Your CA’s private key is the most vital file out of all that you’ll be creating. We’re going to lock it down so that it can only be accessed by root:

The public key is included in the public cert file (ca_cert.pem). That can safely be read by anyone, anywhere, any time.

For bonus points, research setting up access to your new CA’s certificate revocation list (CRL). I did not set that up for mine.

Step 3. Set Your Managing Computer to Trust the CA

Your management computer will access the Nagios site that will be secured by your new CA. Therefore, your management computer needs to trust the certificates issued by that CA, or you’ll get warnings in every major browser.

For a Linux machine (client, not the Nagios server), check to see if /etc/ssl/certs contains several files. If it does (Ubuntu, openSUSE), just copy the CA cert there. You can rename the file so that it stands out better, if you like. Not every app on Linux will read that folder; you’ll need to find directions for those apps specifically.

If your Linux distribution doesn’t have that folder (CentOS), then look for /etc/pki/ca-trust/source/anchors. If that exists (CentOS), copy the certificate file there. Then, run:

For a Windows machine:

  1. Use WinSCP to transfer the ca_cert.pem file to your Windows system (not the key; the key never needs to leave the CA).
  2. Run MMC.EXE as administrator.
  3. Click File->Add/Remove Snap-in.
    secnag_ars
  4. Choose Computer Account and click Next.
    secnag_compacct
  5. Leave Local Computer selected and click Finish.
    secnag_local
  6. Click OK back on the Add/Remove Snap-ins dialog.
  7. Back in the main screen, right-click Trusted Root Certification Authorities. Hover over All Tasks, then click Import.
  8. On the Welcome screen, you should not be allowed to change the selection from Local Machine.
    secnag_implm
  9. Browse to the file that you copied over using WinSCP. You’ll either need to change the selection to allow all files or you’ll need to have renamed the certificate to have a .cer extension.
    secnag_impfilesel
  10. Choose Trusted Root Certification Authorities.
    secnag_imptr
  11. Click Finish on the final screen.
  12. Find your new CA in the list and double-click it to verify.
    secnag_impprod

The above steps can be duplicated for other computers that need to access the Nagios site. For something a bit more widespread, you can deploy the certificate using Group Policy Management Console. In the GPO’s properties, drill down to Computer ConfigurationWindows SettingsSecurity SettingsPublic Key PoliciesTrusted Root Certification Authorities. You can right-click on that node and click Import to start the same wizard that you used above.

Note: Internet Explorer, Edge, and Chrome will use trusted root certificates from the Windows store. The authors of Firefox have decided that reinventing the wheel and maintaining a completely separate certificate store makes sense somehow. You’ll have to configure its trusted root certificate store within the program.

Step 4. Secure the Nagios Web Site

If you followed any of my earlier guides, you’re accessing your Nagios site over port 80 with Basic authentication. That means that any moderately enterprising soul can snag your Nagios site’s clear-text password(s) right out of the Ethernet. You have several options to fix that. I chose to use an SSL site while retaining Basic authentication. Your password still travels, but it travels encrypted. As long as you protect the site’s private key, an attacker should find cracking your password prohibitively difficult.

You could also use Kerberos authentication to have the Nagios site check your credentials against Active Directory. When that works, it appears that your password is protected, even using unencrypted HTTP. However, I could not find an elegant way to combine that with the existing file-based authentication. So, if you’re one of my readers at a smaller site with only one or two domain controllers and you lose your domain for some reason, you’d also lose your ability to log in to your monitoring environment. Also, managing Kerberos users in Nagios is kind of ugly. I didn’t find that a palatable option.

So, we’re going to keep the file-based authentication model and add LDAP authentication on top of it. You’ll be able to use your Active Directory account to log in to the Nagios site, but you’ll also be able to fall back to the existing “nagiosadmin” account when necessary.

One thing that I don’t demonstrate is updating the firewall to allow for port 443. Whatever directions you used to open up port 80, follow those for 443.

Create the Certificate for Apache

If you only use the one site address, then you can continue using the same openssl.cnf file from earlier steps. So, if I were using “https://svlmon1.siron.int/nagios” to access my site, then I would just proceed with what I have. However, I access my site with “https://nagios.siron.int”. I also have a handful of other sites on the same system. I (and you) could certainly create multiple certificates to handle them all. I chose to use Subject Alternate[sic] Names instead. That means that I create a single certificate with all of the names that I want. It means less overhead and micromanagement for me. Again, we’re not hosting a stock exchange, so we don’t need to complicate things.

You have two choices:

  1. Edit your existing openssl.cnf file with the differences for the new certificate(s).
  2. Copy your existing openssl.cnf file, make the changes to the copy, and override the openssl command to use the copied file.

I suppose a third option would be to hack at the openssl command line to manually insert what you want. That requires more gumption than I can muster, and I don’t see any benefits. I’m going with option 2.

Of course, it’s not a requirement to use nano. Use the editor you prefer.

The following shows sample additions to the file. They are not sufficient on their own!

The req_extensions line already exists in the default sample config, but has a hash mark in front of it to comment it out. Remove that (or type a new line, whatever suits you). The [ v3_req ] section probably exists; whatever it’s got, leave it. Just add the subjectAltName line. The [ alt_names ] segment won’t exist (probably). Add it, along with the DNS and IP entries that you want.

Note: The certificates we create now are not part of the CA. I sudo mkdir /var/certs to hold non-CA cert files. That’s a convenience, not a requirement. Follow the guidance from earlier.

If you’re copy/pasting, note that I used a .cnf file from /etc/ssl. Your structure may be different.

You will be asked to answer a series of questions like you did for the CA, with an additional query regarding a password and a company name. I would just [Enter] through both of those.

Verify that your CSR has the necessary Subject Alternate Names:

Secure the private key:

Submit the request to your new CA:

First, openssl will ask you to supply the password for the CA’s private key. Next, you’ll be shown a preview of the certificate and asked twice to confirm its creation.

The generated file will appear in your CA’s configured output directory (the one used in these directions is /var/ca/newcerts). It will use the next serial from your /var/ca/serial file as the name. So, if you’re following straight through, that will be /var/ca/newcerts/01.pem. You can ls /var/ca/newcerts to see them all. The highest number is the one that was just generated. Verify that it’s yours:

Transfer the certificate to whatever location that you’ll have Apache call it from, and, for convenience, rename it:

Tell Apache to Use SSL with the Certificate

Apache allows so much latitude in configuration that it appears to be complicated. Every distribution that installs Apache from repositories follows its own conventions, making things even more challenging. I’ll help guide you where possible. If you feel lost, just remember these things:

  • The last time that Apache finds a configuration setting overrides all previous configurations of that setting
  • Apache reads files in alphabetical order
  • Apache doesn’t care about file names, only extensions

So, any time that a configuration doesn’t work, that means that a later setting overrides. It might be further down in the same file or it might be in another file, but it’s out there somewhere. It might be in a file with a seemingly related name, but it might not be.

Start by locating the master Apache configuration file.

  • Ubuntu and OpenSUSE: /etc/apache2/apache2.conf
  • CentOS: /etc/httpd/conf/httpd.conf

This file will help you to figure out what extensions qualify a configuration file and which directories Apache searches for those configuration files.

We will take these basic steps:

  1. Enable SSL
  2. Instruct Apache to listen on ports 80 and 443
  3. Instruct Apache to redirect all port 80 traffic to port 443
  4. Secure all 443 traffic with the certificate that we created in the preceding section

Enable SSL in Apache

Your distribution probably enabled SSL already. Verify on Ubuntu with apache2 -M | grep ssl. Verify on CentOS/OpenSUSE with httpd -M | grep ssl. If you are rewarded with a return of ssl_module, then you don’t need to do anything else.

To enable Apache SSL on Ubuntu/OpenSUSE: sudo a2enmod ssl.

To enable Apache SSL on CentOS: sudo yum install mod_ssl.

Configure SSL in Apache

We could do all of steps 2-4 in a single file or across multiple files. I tend to do step 2 in a place that makes sense for the distribution, then steps 3 and 4 in the primary site configuration file. We could also spread out certificates across multiple virtual hosts. I’m not hosting tenants, so I tend to use one virtual host per site, but each uses the same certificate.

Remember, it doesn’t really matter where any of these things are set. The only thing that matters is that they are processed by Apache after any conflicting settings. Do your best to simply eliminate any conflicts. For instance, CentOS puts a lot of SSL settings in /etc/httpd/conf.d/ssl.conf. For that distribution, I left all of the settings it creates for defaults but commented out the entire VirtualHost host item. I strongly encourage you to create backup copies of any file before you modify them. Ex: cp /etc/httpd/conf.d/ssl.conf /etc/httpd/conf.ssl.conf.original

Somewhere, you need a Listen 443 directive. Most distributions will automatically set it when you enable SSL (look in ports.conf or a conf file with “ssl” in the name). However, I’ve had a few times when that only worked for IPv6. If you can’t get 443 to work on IPv4, try Listen 0.0.0.0:443. This resolves step 2.

Next, we need a port 80 to 443 redirect. Apache has an “easy” Redirect command, but it’s too restrictive unless you’re only hosting a single site. In my primary site file, I create an empty port 80 site that redirects all inbound requests to an otherwise identical location using https:

This sequence sends a 301 code back to the browser along with the “corrected” URL. As long as the browser understands what to do with 301s (every modern browser does), then the URL will be rewritten right in the address bar. If you’re stuck for where to place this, I recommend:

  • On Ubuntu: /etc/apache2/sites-available/000-default.conf (symlinked from /etc/apache2/sites-enabled/)
  • On CentOS: /etc/httpd/conf.d/sites.conf
  • On OpenSUSE: /etc/apache2/default-server.conf

Wherever you put it, you need to verify that there are no other virtual hosts set to port 80. If there are, comment them out. You could also replace the 80 with 443, provided that you also add in the certificate settings that I’m about to show you.

After setting up the 80->443 redirect, you next need to configure a secured virtual host. It must do two things: listen on port 443 and use a certificate to encrypt traffic. Mine looks like this:

If you have other sites on the same host, create essentially the same thing but use the ServerName/ServerAlias fields to differentiate. For instance, my MRTG site is on the same server:

If you want, you can certainly use the instructions from the preceding section to create as many additional certificates as necessary for your other sites.

You’ve finished the hard work! Now just restart Apache. service apache2 restart on systems that name the service “apache2” (Ubuntu, OpenSUSE) or service httpd restart (CentOS). Test by accessing the site using an https prefix, then again with an http prefix to ensure that redirection works.

Step 5: Configure Nagios for Active Directory Authentication

Now that we’re securing the Nagios web site with SSL, we can feel a little bit better about entering Active Directory domain credentials into its challenge dialog. We have five phases for that process.

  1. Create (or designate) an account to use for directory reads.
  2. Select an OU to scan for valid accounts.
  3. Enable Apache to use LDAP authentication.
  4. Configure Apache directory security.
  5. Set Nagios to recognize Active Directory accounts.

Create an Account for Directory Access

Use PowerShell or Active Directory Users and Computers to create a user account. It does not matter what you call it. It does not matter where you put it. It does not need to have any group membership other than the default Domain Users group. It only requires enough powers to read the directory, which all Domain Users have by default. I recommend that you set its password to never expire or be prepared to periodically update your Nagios configuration files.

Once you’ve created it, you need its distinguished name. You can find that on the Attribute Editor tab in ADUC. You can also find it with Get-ADUser:

secnag_userdn

Keep the DN and the password on hand. You’ll need them in a bit.

Selecting OUs for Apache LDAP

When an account logs in to the web site, Apache’s mod_authnz_ldap will search for it within locations that you specify. You need to know the distingished name of at least one organizational unit. Apache’s mod_ldap queries cannot run against the entire directory. I found many, many, many articles claiming that it’s possible, including Apache’s official document, but they are all lies (thanks for wasting hours of my time on searches and tests, though, guys, I always appreciate that).

It will, however, search multiple locations, and it can search downward into the child OUs of whatever OU you specify. Luckily for me, I have a special Accounts OU that I’ve created to organize user accounts. Hopefully, you have something similar. If not, you can use the default Users folder. You can do both.

I’ll show you how to connect to an OU and the default Users folder.

secnag_ous

It is not necessary for the directory read account that you created in the first part of this section to exist in the selected location(s). The target location(s), or a sub-OU, only needs to contain the accounts that will log in to Nagios.

Once you’ve made your selection(s), you need to know the distinguished name(s). You can use the Attribute Editor tab like you did for the user, or Get-ADOrganizationalUnit:

secnag_oudn

Enabling LDAP Authentication in Apache

Apache requires two modules for LDAP authentication: authnz_ldap_module and ldap_module. You will probably need to enable them, but you can check in advance. On Ubuntu, use apache2 -M | grep ldap. On CentOS/OpenSUSE, use httpd -M | grep ldap. If you see both of these modules, then you don’t need to do anything else.

To enable Apache LDAP authentication on Ubuntu/OpenSUSE: sudo a2enmod authnz_ldap. You might also need to: sudo a2enmod ldap.

To enable Apache LDAP authentication on CentOS: yum install mod_ldap.

Make certain to perform the apachectl -M verification afterward to ensure that both modules are available.

Configuring Apache Directories to Use LDAP Authentication

Collect your OU DN(s), your user DN, and the password that user. Now, we’re going to configure LDAP authorization sections in Apache. Again, you can put these in any conf file that pleases you. I usually find the distribution’s LDAP configuration file:

  • Ubuntu: /etc/apache2/mods-available/ldap.conf
  • CentOS: /etc/httpd/conf.modules.d/01-ldap.conf
  • OpenSUSE: no default file is created for the ldap module on OpenSUSE; you can create your own or add it to another, like /etc/apache2/global.conf

Warning: On Ubuntu, the files always exist in mods-available; when you run a2enmod, it symlinks them from mods-enabled. I highly recommend that you avoid the mods-enabled directory. Eventually, something bad will happen if you touch anything there manually (yes, that’s experience talking). Edit the files in mods-available.

My ldap.conf, for comparison:

The initial lines are not required, but can make the overall experience a bit smoother. I am just using defaults; I didn’t tune any of those lines. The only thing to be aware of is that Apache will be oblivious to changes that occur during cache timeouts — including lockouts and disables.

A breakdown of the AuthnProviderAlias sections:

  • In the opening brackets, AuthnProviderAlias and ldap must be the first two parts. We are triggering the authn provider framework and telling it that we’re specifically working with the ldap provider. Where I used ldap-accounts and ldap-users, you can use anything you like. I named them after the specific OUs that I selected. Whatever you enter here will be used as reference tags in directories.
  • For AuthLDAPBindDN, use the distinguished name of the read-only Active Directory user that you created at the beginning of this section. You can omit the quotes if you have no spaces in the DN, but I would recommend keeping them just in case.
  • For AuthLDAPBindPassword, use the password of the read-only Active Directory account. Do not use quotes unless there are quotes in the password. If your password contains a quote, I recommend changing it.
  • For AuthLDAPURL, use the distinguished name of the OU to search. Use one? instead of sub? if you don’t want it to search sub-OUs.

Note that the Users folder uses CN, not OU.

TLS/SSL/LDAPS Configuration for Apache and LDAP

You should be able to authenticate with TLS or LDAPS if configured in your domain. I couldn’t get that to work because of the state of my domain. I have made it work elsewhere, so I can confirm that it does work. If you want to try on your own, I will tell you that right off, find the “LogLevel” line in your Apache configs and bump it up to “Debug” until you have it working, or you’ll have no idea why things don’t work. The logs are output to somewhere in /var/logs/httpd or /var/logs/apache2, depending on your configuration/distribution (the file is usually ssl_error_log, but it can be overridden, so you might need to dig a tiny bit). You can go through Apache’s documentation on this mod for some hints. You need at least:

  • LDAPTrustedGlobalCert CA_BASE64 /path/to/your/domain/CA.pem in some Apache file. I use the built-in ldap.conf or 01-ldap.conf for Apache. If you download the certificate chain from your domain CA’s web enrollment server, you can extract the subordinate’s certificate and convert it from P7B to PEM.
  • LDAPTrustedMode SSL in some Apache file if you will be using LDAPS on port 636. I normally keep it near the previous entry. Note: you can also just append SSL to any of the AuthLDAPURL entries for local configuration instead of global. In your AuthLDAPURL lines, you must change ldap: to ldaps: and append :636 to the hostname portion. Ex: AuthLDAPURL ldaps://siron.int:636/OU=Accounts,DC=siron,DC=int?sAMAccountName?sub?(objectClass=user)
  • LDAPTrustedMode TLS in some Apache file if you will be using TLS on port 389. I normally keep it near the previous entry. Note: you can also just append TLS any of the AuthLDAPURL entries for local configuration instead of global.
  • In the <Directory> fields that attach to AD, you need: LDAPTrustedClientCert CERT_BASE64 /var/certs/your-local-system.pem. It might also work in the AuthnProviderAlias blocks; I haven’t yet been able to try.
  • You might need to add LDAPVerifyServerCert off to an Apache configuration. I don’t like that, because it eliminates the domain controller authentication benefit of using TLS or LDAPS. Essentially, if you can get openssl -s_client -connect your.domain.address:636 -CAfile ca-cert-file-from-first-bullet.pem to work, then you will be fine.

The hardest part is usually keeping LDAPVerifyServerCert On. First, use openssl s_client -connect your.domain.controller:636 -CAfile your.addomain.cafile.pem. It will display a certificate. Paste that into a file and save it. Then, use openssl verify -CAfile your.addomain.cafile the.file.you.pasted. If that says OK, then you should be able to get SSL/TLS to work.

Because security is our goal here and I couldn’t get TLS or LDAP to work, I did run a Wireshark trace on the communication between the Nagios system and my domain controller. It does pass the user name in clear-text, but it does not transmit the password in clear text. I don’t love it that the user names are clear, but I also know that there are much easier ways to determine domain user accounts than by sniffing Apache LDAP authentication packets. There are also easier ways to crack your domain than by spoofing a domain controller to your Nagios system. If you can’t get TLS or LDAP to work, it won’t be the weakest link in your authentication system.

Note 1: Be very, very, very careful about typing. You’re handling your directory in read-only mode, so I wouldn’t worry about breaking the directory. What you need to worry about is the very poor error reporting in this module. I lost an enormous amount of time over a hyphen where an equal sign should have been. It was right on the side-scroll break of nano so I didn’t see it for a very long time. The only error that I got was AH01618: user esadmin not found: /., or whatever account I was trying to authenticate. If things don’t work, slow down, check for typos, check for overrides from other conf files.

Note 2: I will happily welcome verifiable assistance on improving this section. If you just throw URLs at me, they’d better contain something that I didn’t find on any of the 20+ pages that made big promises without delivering, and the directions had better work. For example, using port 3268 to authenticate against the global catalog vs. 389 LDAP or 636 LDAPS does not do anything special for trying to authenticate the entire directory.

Configure Apache Directory Security for LDAP Authentication

From here, the Apache portion is relatively simple. Assuming that you already have a Nagios directory configured, just compare with mine:

The default Nagios site created by the Nagios installer contains a lot of fluff, which I’ve removed. For instance, I don’t check the Apache version because I know what version it is. There’s only one major change, though: look at the AuthBasicProvider line. Yours, if you’re using the default, just says file. Mine also says ldap-users ldap-accounts. Those are the tags that I applied to the providers in the previous sub-section. By leaving file in there, I can still use the original “nagiosadmin” account, as well as any others that I might have created. If you create additional providers for other OUs, just keep tacking them onto the AuthBasicProvider lines.

On the AuthBasicProvider line, order is important. I placed file first because I want accounts to be verified there first. The majority of my accounts will be found in Active Directory, but the file is only a couple of lines and can be searched in a few milliseconds. If I need to reach out to the directory for an uncached account, that will cause a noticeable delay. For the same reason, order your LDAP locations wisely.

Test

We’re not quite done; Nagios still doesn’t know what to do with these accounts. However, stop right now and go make sure that AD authentication is working.

sudo service apache2 restart or sudo service httpd restart, depending on your distribution. If Apache doesn’t restart successfully, use sudo journalctl -xe to find out why. Fix that, and move on. Once Apache successfully restarts, access your site at https:/yournagiosite.yourdomain.yourtld. Log in using an Active Directory account inside a selected OU. You do not need to prefix it with the domain name.

If all is well, you should be greeted with the Nagios home page. Click any of the navigation links on the left. The pages should load, but you should not be able to see anything — no hosts, no services, nothing. If so, that means that Apache has figured out who you are, but Nagios hasn’t. You can double-check that at the top left of most any of the pages. For instance, on the Tactical Overview:

secnag_userid

Do not move past this point until AD authentication works.

Configure Nagios to Recognize Active Directory Accounts

Truthfully, Nagios doesn’t know an AD account from a file account. All it knows is that Apache is delivering an account to it. It will then look through its registered contacts for a match. So, in /usr/local/nagios/etc/objects/contacts.cfg, I have:

From there, add that account to groups, services, hosts, etc. as necessary. So, if your CFO wants a dashboard to show him that the accounting server is working, add his AD account accordingly. An account will only be shown its assigned items.

Remember, after any change to Nagios files, you must:

Note on cgi access: by default, only the “nagiosadmin” account can access the CGIs (most of the headings underneath the System menu item at the bottom left). That access is controlled by several “authorized_” lines in /usr/local/nagios/etc/cgi.cfg. As you become accustomed to using multiple accounts in Nagios, you’ll begin plopping them into groups for easier file maintenance. In this particular .cfg file, groups don’t mean anything. I found some Nagios documentation that insists that you can use groups in cgi.cfg, but I couldn’t make that work. You’ll have to enter each account name that you want to access any CGI.

Step 6: Configure check_nrpe and NSClient++ for SSL

After all that you’ve been through in this article, I hope that this serves as comfort: the rest is easy.

We’re going to take three major actions. First, we’ll create a “client” certificate for the check_nrpe utility, and then we’ll create a “server” certificate to be used with all of your NSClient++ systems. After that, we deploy the certificate to monitored systems and configure NSClient++ to use it.

Configure a Certificate for check_nrpe

This part is almost identical to the creation of the SSL certificate for the Apache site. You need to set up a config file to feed into openssl (or modify the default, but I don’t recommend that).

You have two choices:

  1. Edit your existing openssl.cnf file with the differences for the new certificate(s).
  2. Copy your existing openssl.cnf file, make the changes to the copy, and override the openssl command to use the copied file.

I suppose a third option would, again, be to hack at the openssl command line to manually insert what you want. I’m going with option 2 this time, as well.

Of course, it’s not a requirement to use nano. Use the editor you prefer.

The following shows sample replacements and additions to the file. They are not sufficient on their own!

The req_extensions line already exists in the default sample config, but has a hash mark in front of it to comment it out. Remove that (or type a new line, whatever suits you). The [ v3_req ] section probably exists; whatever it’s got, leave it. Just add the subjectAltName line. The [ alt_names ] segment won’t exist (probably). Add it, along with the DNS and IP entries that you want. I don’t know precisely how the check_nrpe tool presents itself to remote systems, or even if the monitored systems care beyond receiving a valid certificate, so set the DNS and IP entries to cover all of your bases.

Now, create the certificate request:

[Enter] through the questions (unless you need to change a default). Do not use a password when asked.

Lock the private key down:

Submit the CSR to your CA. I’m going to use the CA that I created on my Nagios system:

First, openssl will ask you to supply the password for the CA’s private key. Next, you’ll be shown a preview of the certificate and asked twice to confirm its creation.

The generated file will appear in your CA’s configured output directory (the one used in these directions is /var/ca/newcerts). It will use the next serial from your /var/ca/serial file as the name. So, if you’re following straight through, that will be /var/ca/newcerts/02.pem. You can ls /var/ca/newcerts to see them all. The highest number is the one that was just generated. Verify that it’s yours:

Transfer the certificate to the location that you’ll have check_nrpe load it from, and, for convenience, rename it:

Now “all” you need to do is go around and change every instance of check_nrpe in your object files to use the new certificate and never forget to use it on all new check_nrpe commands and change all those instances if you ever change something about the certificate. Who wants to do that? Oh, right, no one wants to do that.

So, let’s do this instead:

In the nano screen, paste this:

Test:

Note: In the original run of this document, I did not include the -L 'ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH' portion, and both encrypted and unencrypted NSClient++ installations worked just fine. Two weeks later, all checks began failing over cipher mismatches. There was no indication as to why they worked up until that moment. So, you may need to test your unencrypted systems WITHOUT the -L bit, then add it back in later. The $* after the -L is required.

Assuming that you used a valid IP for a system running NSClient++ on the default port, you should receive something similar to:

It doesn’t matter if the NSClient++ system is still configured to use insecure mode (mind the note above). If it says that you used incorrect parameters, that means that you didn’t compile check_nrpe with SSL support. Go back and do that (I gave instructions in all of my Nagios articles; if you update to the most recent version of check_nrpe then SSL is implicitly configured). For any other problems, you may need additional parameters, ex: to override the port. Use whatever is working in your existing commands.

Once you have this working, you can go into all of your command files and swap out all instances of “check_nrpe” for “check_nrpe_secure” with a simple Find/Replace operation. That’s all that you’ll need to remember to do going forward. If you need to change anything about the certificate(s), then you only need to modify check_nrpe_secure itself.

Configure a Certificate for NSClient++

You’ve seen this game a couple of times now. The primary difference between this and previous certificate generation is that I will not be using any subject alternate names for the NSClient++ certificate. Working from the assumption that you don’t want to sit around generating CSRs and delivering certificates to every single monitored system, I’m going to have you create one certificate to use on all of your monitored hosts. I suppose that’s a slight security risk, because anyone with the certificate set could pretend to be one of your monitored hosts. I think if they can spoof your system that well, being able to fool Nagios will be the least of your concerns. This exercise satisfies our primary goal of encrypting the Nagios<->monitored system traffic with a nice touch of allowing the monitored systems to authenticate the Nagios system. Anything more is mostly extra effort.

Generate the CSR first. You can do this from pretty much anywhere, but I’ll continue to use the pattern that we’ve established in this article:

Of course, it’s not a requirement to use nano. Use the editor you prefer.

The following shows sample replacements to the file. They are not sufficient on their own!

All of the changes that we made for the other certificates are unnecessary here. In fact, you could probably feed in the default openssl.cnf and just change the commonName when prompted. To each their own.

Create the certificate request:

[Enter] through the questions (unless you need to change a default). Do not use a password when asked.

Locking down this particular private key file won’t do much because you’re going to be copying it to all of your monitored hosts anyway.

Submit the CSR to your CA. Again, I’m using my Nagios system’s CA:

First, openssl will ask you to supply the password for the CA’s private key. Next, you’ll be shown a preview of the certificate and asked twice to confirm its creation.

The generated file will appear in your CA’s configured output directory (the one used in these directions is /var/ca/newcerts). It will use the next serial from your /var/ca/serial file as the name. So, if you’re following straight through, that will be /var/ca/newcerts/03.pem. You can ls /var/ca/newcerts to see them all. The highest number is the one that was just generated. Verify that it’s yours:

Transfer the certificate and private key to some distribution point. While I’m not overly concerned about securing the key this time, do perform some basic due diligence. No need to just open the door and invite attackers in. I do generally copy it over to /var/certs just for consistency (avoids the “now, where did I put that” problem).

Configure NSClient++ for Certificate Security

Start by making sure that all checks that target this host use the new check_nrpe_secure. If they use the original check_nrpe without certificate information, all checks to the host will fail once you complete these steps.

You need three things to proceed: the certificate that you created for NSClient, the private key that you created for NSClient, and the certificate for the CA that signed the NSClient certificate. You do not need the CA’s private key. Just leave that where it is.

Copy the three files to C:Program FilesNSClient++security.

Edit your nsclient.ini file. Here’s mine:

Restart the service ( Restart-Service nscp or net stop nscp && net start nscp).

From here, you just need to come up with a deployment technique that transfers the certificates, key, and ini file to all other systems, then restarts the ncsp service.

 

How to Monitor Hyper-V with Nagios: CentOS Edition

How to Monitor Hyper-V with Nagios: CentOS Edition

 

Are you monitoring your systems yet? I’m fairly certain that I make a big deal about that every few articles, don’t I? Last year, I wrote an article showing how to install and configure Nagios on Ubuntu Server. I’ve learned a few things in the interim. I also realize that not everyone wants to use Ubuntu Server. So, I set out on a mission to deploy Nagios on another popular distribution: CentOS. I’m here to share the procedure with you. Follow along to learn how to create your own monitoring system with no out-of-pocket cost. If you’re new to CentOS Linux on Hyper-V we suggest you check out the post.

Included in this Article

This article is very long because I believe in detailed instructions that help the reader understand why they’re typing something. It looks much worse than it is. I’ll cover:

  • A brief description of Nagios Core, the monitoring system
  • A brief description of NSClient++, the agent that operates on Windows/Hyper-V Servers to enable monitoring
  • Configuring a CentOS system to operate Nagios
  • Acquiring and installing the Linux packages
  • A discussion on the security of the NRPE plugin. Take time to skip down to this section and read it. The NSClient++/NRPE configuration that I will demonstrate in this article presents a real security concern. I believe that the risk is worthwhile and manageable, but you and/or your security team may disagree. Decide before you start this project, not halfway through.
  • Acquiring and installing the Windows packages
  • Configuring basic Nagios monitoring
  • Nagios usage

Not Included in this Article

While comprehensive, this article isn’t entirely all-encompassing. I am going to get you started on configuring monitoring, but you’re going to need to do some tinkering and investigation on your own. Building an optimal Nagios system requires practice more than rote instruction and memorization.

I have designed several monitoring scripts specifically for Hyper-V and failover clusters. These can be found in our subscriber’s area. Currently, the list includes:

  • Checking the free space of a Cluster Shared Volume
  • Checking the status of a Cluster Shared Volume (Redirected Access, etc.)
  • Checking the age of checkpoints
  • Checking the expansion percentage of dynamically-expanding VHD/Xs
  • Checking the health of a quorum witness

Since Nagios will alert you if any of these resources get into trouble, you can begin using those features without fear that they’ll break something while you’re not paying attention. Some of those monitors are shown in this grab of Nagios’ web interface:

Nagios Sample Data

Nagios Sample Data

What is Nagios Core?

Nagios Core is an open source software tool that can be used to monitor network-connected systems and devices. It processes data from sensors and separates the results into categories. By name, these categories are OK, Warning, and Critical. By default, Nagios Core sends a repeating e-mail when a sensor is in a persistent Warning or Critical state and a single “Recovery” e-mail when it has returned to the OK state.

Sensors collect data by “active” and/or “passive” checks. Nagios Core initiates active checks by periodically triggering plug-ins. Passive checks are when remote processes “call home” to the Nagios Core system to report status to a plug-in. The plug-in then delivers the sensor data to Nagios Core.

These plug-ins give Nagios Core its flexibility. Several plugins ship alongside Nagios Core. The Nagios community makes others available separately. Some are only included with the paid Nagios XI, which I will not cover. A plug-in is simply a Linux executable that collects information in accordance with its programming and returns data in a format that Nagios can parse.

Nagios Core provides multiple configurable options. One that we will be using is its web interface — a tiny snippet is shown in the screenshot above. This interface is not required, but grants you the ability to visually scan your environment from an overview level down to the individual sensor level. It also gives you other abilities, such as “Acknowledging” a Warning or Critical state and re-scheduling pending checks to make the next one occur very quickly(for testing) or much later (for repairs).

Isn’t Nagios Core Difficult?

Nagios Core has a reputation for being difficult to use, which I don’t think is appropriate. I believe that it got that reputation because you configure it with text files instead of in some flowery GUI. Nagios XI adds simpler configuration, but many will find that the cost jump from Core to XI makes editing text files more attractive.

Fortunately, the default installation include templates that not only show you exactly what you need to do, but also give you the ability to set things up via copy/paste and only a bit of typing. Personally, I found the learning curve to be very steep but also very short. Overall, I find Nagios Core much easier to use than the monitoring component of Microsoft’s full-blown Systems Center Operations Manager.

From here on out, I’m only going to use “Nagios” to mean “Nagios Core”.

What is NSClient++?

NSClient++ is a small service application that resides on Windows systems and interacts with a remote Nagios system. Since Nagios runs on Linux, it cannot perform a number of common Windows tasks. NSClient++ bridges the gap. Of its many features, we will be using it as a target for the “check_nt” and “check_nrpe” Nagios plug-ins. Upon receiving active check queries from these two plug-ins, it performs the requested checks and returns the data to those plug-ins.

Prerequisites for Installing Nagios and NSClient++

I’ve done my best to make this a one-stop experience. You’ll need to bring these things for an optimal experience:

  • One installation of CentOS. You can use a virtual machine, with these guidelines:
    • 2 vCPU, 512MB startup RAM, 256MB minimum RAM, 1GB maximum RAM, and a 40GB disk. Mine uses around 600MB of RAM in production, a negligible amount of CPU, and the VHDX has remained under 2GB.
    • Assign a static IP or use a DHCP reservation. You will be configuring NSClient++ to restrict queries to that IP.
    • If you only have one Hyper-V host, find some piece of hardware to use for Nagios. If you don’t have anything handy, check Goodwill or garage sales. You don’t want your monitoring system to be dependent on your only Hyper-V system.
    • I recommend against clustering the virtual machine that holds your Nagios installation. The less it depends upon, the better. In a 2-node Hyper-V cluster, I configure one Nagios system on internal storage on one node and a second Nagios system on the second node that does nothing but monitor the first.
    • Refer to my prior article if you need assistance installing CentOS. Includes instructions for running it in Hyper-V.
  • Download NSClient++ for the Windows/Hyper-V Servers to monitor. If you only have a few systems, the MSI will be the easiest to work with. If you have many, you might want to get the ZIP for Robocopy distribution.
    • Note: If using the ZIP, install the latest VC++ redistributable on target systems. Without the necessary DLLs, the NSClient service will not run and does not have the ability to throw any errors to explain why it won’t run.
  • NRPE (Nagios Remote Plugin Executor). This will run on the Nagios system.
  • WinSCP (optional). You can get by without WinSCP, but it makes Nagios administration much easier. See my previously linked article on CentOS for a WinSCP primer.
  • PuTTY (optional). You could also get by without PuTTY, if you absolutely had to. I wouldn’t try it. The linked CentOS article includes a primer for PuTTY as well.
  • Download Nagios Core and the Nagios plugins from nagios.org to your management computer. More detailed instructions follow.

Software Versions in this Document

This article was written using the following software versions:

  • Nagios Core 4.3.1
  • Nagios Plugins 2.2.1
  • NSClient++ 0.5.0.62
  • NRPE 3.1.0. For our purposes, a 2.x version would be fine as well because v3.x needs to downgrade its packets to talk to NSClient++ anyway.

Downloading Nagios Core and Transferring it to the Target System

Start on Nagios.org. I’ll give step-by-step instructions that worked for me. Seeing as how this is the Internet, things might be different by the time you read these words. Your goal is to download Nagios Core and the Nagios Plugins.

  1. From the Nagios home page, hover over Downloads at the top right of the menu. Click Nagios Core.
  2. You’ll be taken to the editions page. Under the Core column, click Download.
  3. If you want to fill in your information, go ahead. Otherwise, there’s a Skip to download link.
  4. You should now be looking at a table with the latest release and the release immediately prior. At the far right of the table are the download links. For reference, the version that I downloaded said nagios-4.3.1.tar.gz. Click the link to begin the download. Don’t close this window.
  5. After, or while, the main package is downloading, you can download the plugins. You can hover over Downloads and click Nagios Plugins, or you can scroll down on the main package download screen to Step 2 where you’ll find a link that takes you to the same page.
  6. You should now be looking at a similar table that has a single entry with the latest version of Nagios tools. The link is at the far right of this table; the one that I acquired was nagios-plugins-2.2.1.tar.gz. Download the current version.
  7. If you didn’t already download NRPE, do so now.
  8. Connect to your target system in WinSCP (or whatever other tool that you like) and transfer the files to your user’s home folder. I tend to create a Downloads folder (keep in mind that Linux is case-sensitive), but it doesn’t really matter if you create a folder or what you call it as long as you can navigate the system well enough to find the files.
    nagoncent_nagfiles

Note: You could use the wget application to download directly to your CentOS system. I never download anything from the Internet directly to a server.

Prepare CentOS for Nagios

Nagios depends on a number of other packages in order for its installation and operation. These steps were tested on a standard deployment of CentOS, but should also work on a minimal build.

Download and install prerequisite packages:

Then, we’ll create a user for operating Nagios.

Upon entering the passwd command, you’ll be asked to provide a password. I don’t want to tell you what to do, but you should probably keep note of it.

Next, we’ll create a security group responsible for managing Nagios and populate it with that new “nagios” user group and the account that Apache runs under.

Install or Upgrade Nagios on CentOS

Now we’re ready to compile and install Nagios.

First, you need to extract the files.

Note:I am not using sudo for the extraction! If you run the extraction with sudo, then you will always need to use sudo to manipulate the extracted files.

Note: I am using the directory structure and versions from my WinSCP screenshot earlier. If you placed your files elsewhere or have newer versions, this is an example instead of something you can copy/paste.

Execute the following to build and install Nagios.

Note: If upgrading, STOP after sudo make install or your config files will be renamed and replaced with the new defaults!

Note: Do not copy/paste this entire block at once. Run each line individually and watch for errors in the output.

Only if new install. Set Nagios to start automatically when the system starts.

Only if upgrading. Instruct CentOS to refresh its daemons and restart the newly replaced Nagios executable.

Install or Upgrade Nagios Plugins on CentOS

The plugin installation process is similar to the Nagios installation process, but shorter and easier.

Unpack the files first. The same notes from the Nagios section are applicable.

Compile and install the plugins. The same notes from the Nagios section are applicable, especially the bit about taking this one line at a time.

You’ve just installed several plugins for Nagios, most of which I’m not going to show you how to use. If you’d like to take a look, navigate to /usr/local/nagios/libexec:

nagoncent_plugins

Most of them have built-in help that you can access with a -h or –help:

You can also search on the Internet for assistance and examples.


Security and the NRPE Plugin

The next step is to compile and install the NRPE plugin. Before that, we need to stop and have a serious chat about it.

You might read in some places that NRPE plugin is a security risk. That is correct. It allows one computer to tell another computer to run a script and return the results. Furthermore, we’re going to be sending arguments (essentially, parameters) to those scripts. Doing so opens the door to injection attacks. One method that has been used to combat the issue is NRPE traffic encryption. I am not going to be exploring how to encrypt NRPE communications at this time.

I have several reasons for this:

  •  The simplest reason is that it’s difficult to do and I’m not certain of how much value is in the effort.
  • Encryption is often mistaken for data security when it is, in fact, more about data privacy. For example, if you transmit your password in encrypted format and the packet is intercepted, the attacker still has your password. The fact that it’s encrypted might be enough to put the attacker off, but any encryption can be broken with sufficient time and effort. Therefore, your password is only private in the sense that no casual observer will be able to see it. To keep it secure, you should not transmit it at all. We don’t really have that option. Because we are not encrypting, what an attacker could see is the command string and the result string. You’ll have full knowledge of what those are, so you can decide how serious that is to you. Our best approach is to ensure that the Nagios<->host communications chain only occurs on secured networks, even if we later enable SSL.
  • The author of NSClient++ had the good sense to ensure that you can’t operate just any old free-form script via NRPE. Scripts must be specifically defined and can be tightly controlled. If the script itself is sufficiently well-designed, a script injection attack should be prohibitively difficult. That still leaves the door open to data snooping, so take care in what data your checks return.
  • The author of NSClient++ also coded in the ability to restrict NRPE activities to specific source IP addresses. IP spoofing is possible, of course.
  • Windows, Linux, and/or hardware firewalls can help enforce the source and destination IP communications. Spoofing is still a risk, of course.
  • I ran a Wireshark trace on a Nagios-to-NSClient++ communications channel. Nothing was transmitted in clear text. There were changes made in NRPE 3.x that lead me to believe that it might be performing some encryption. Then again, it might just be Base64 encoding. Either way, no casual observer will be able to snoop it.

What I didn’t address in the above points is that NSClient++ could effectively authenticate the Nagios computer by only accepting traffic that was encrypted with its private key. So, yes, NRPE is a security risk and it is a higher risk without SSL. I won’t try to convince you otherwise.

I believe that, for internal systems, the risk is very manageable. If you’re going to be connecting to remote client sites, I would put the entire Nagios communications chain inside an encrypted VPN tunnel anyway because even if you encrypt NRPE, the other traffic is clear-text. The only people that I think should worry much about this are those that will be connecting Nagios to Hyper-V hosts using unsecured networks. Personally, I’m uncertain how a case could be made to do that even with SSL configured.

I’m not saying that I’ll never look into encrypting NRPE. Just not now, not in this article.


Install or Upgrade the NRPE Plugin on CentOS

For our purposes, the NRPE plugin requires little effort to install.

If you’re not already in the folder that contains the NRPE gzip file, return there. Unpack the file just like you did with the others.

Switch to the extracted folder. Compile and install the plugin. Part of the configure process includes creating a new SSL key. It requires several minutes to complete.

Verify that the plugin was created:

If CentOS responded by showing you a file, then all is well.

Connecting Apache to Nagios on CentOS

At this point, Nagios works and can begin monitoring systems. You’ll need to do some extra work to get the web interface going.

Start by creating an administrative web user account. This account belongs to Apache, not CentOS. As created in this article, it will have full access to everything in the web interface.

You will be prompted to enter a password for the “nagiosadmin” account. Be aware that the -c parameter causes the file to be created anew. If it already exists, it will be overwritten.

You also need to add the CentOS account that Apache uses to the security group that can access Nagios’ files:

Next, set Apache to start when the system boots:

Allow port 80 through the firewall:

By default, the Security-Enhanced Linux module will prevent several of Nagios’ CGI modules from operating. If you want to quickly get around that, simply disable SELinux. Since we’re not hosting an online banking website or anything, I feel that this is an appropriate solution:

When nano opens the config file, change the line that reads SELINUX=enforcing to SELINUX=disabled. Press [CTRL]+[X] when finished, then [Y] and [Enter] to exit. I tinkered with some of the other options and mostly managed to break my system. If you’re concerned, then this article might help you.

Apache on CentOS ships with a default page that we need to disable:

Start Apache:

Your Nagios installation can now be viewed at http://yourserver/nagios. If you want a quick test from the local command line:

If you get a 301 or a lot of HTML with an embedded message that your browser doesn’t work, that’s a good sign.

Configuring Apache to Use Nagios as the Default Site

If you don’t want to hang /nagios off of URL requests to your system, follow these directions.

Open /etc/httpd/conf.d/nagios.conf in any text editor. It’s a fairly large file, so WinSCP or Notepad++ will make the chore simpler. From nano:

Add the following lines, either at the beginning or the end of the file. I’ve highlighted two lines where you’ll want to substitute your system details for mine:

Restart Apache for the settings to take effect:

Your Nagios installation will now appear at http://yourserver, without the need to add /nagios. If you changed those first two lines and you add matching records to your internal DNS server, the system will also respond at the specified URL.

Configure Nagios to Send E-mail on CentOS

By default, Nagios will use the /usr/bin/mail executable to send e-mail. You need to configure Postfix for that to work. There are many ways that can be done, and I have neither the time nor the systems to test them all. Fortunately, a document already exists that can help you with the most common configurations. You can also find several how-tos from the postfix documentation page. I will show you how to get started and I’ll demonstrate the two methods that I know. For anything else, you’ll need to research on your own.

Initial Postfix Configuration on CentOS

It’s easy to get started:

The basic e-mail infrastructure is now on your system.

Relaying Mail Through a Friendly Mail Server

If you’ve got a mail server that will allow anonymous e-mail via port 25 connections (like an Exchange server that allows local addresses), you have very little to do.

Open /etc/postfix/main.cf in a text editor. This is a large file and you’re going to be doing a lot of navigating, so choose your editor wisely. For nano:

Make these changes:

  • Uncomment the #myhostname line (by removing the #). Change it to: myhostname = myserver.mydomain.mytld (Substituting your server and domain information). This is the host name that it will present to the mail server.
  • Uncomment the #myorigin line. Change it to myorigin = mydomain.mytld (Substituting your domain information). E-mails sent by this server will append that domain to the user name.
  • Uncomment one of the #inet_interfaces lines or add a new one. Change it to: inet_interfaces = loopback-only. This sets this server to not receive any inbound e-mail.
  • After the #mydestination lines, add this: mydestination = . This will also prevent this server from accepting e-mail.
  • Uncomment one of the #relayhost lines or add a new one in this format: relayhost = myrealmailserver.mydomain.mytld. Substitute the real name or IP address of the host that will relay e-mail for this server.

Restart postfix:

Relaying Mail Through Your ISP

Some of us don’t have our own mail server. If you’re paying for a static IP and have registered a domain name, then you could configure your new Postfix installation as a true mail server. But, most of us aren’t that lucky either. Instead, we can configure Postfix to log in to our ISP’s SMTP account and send e-mail as us. Credit to ProfitBricks.

Install the necessary security binaries:

Open /etc/postfix/main.cf in a text editor. This is a large file and you’re going to be doing a lot of navigating, so choose your editor wisely. For nano:

Make these changes:

  • Uncomment the #myhostname line (by removing the #). Change it to: myhostname = myserver.mydomain.mytld (Substituting your server and domain information). This is the host name that it will present to the mail server.
  • Uncomment one of the #relayhost lines or add a new one in this format: relayhost = smtpserver.yourisp.tld. Substitute the real name or IP address of your ISP according to their instructions for SMTP connections. If your ISP requires a different port (ex: Gmail), use brackets around the host name, a colon, and the port: relayhost = [smtp.gmail.com]:587.
  • At the end of the file, add:

Create a file to hold the username and password to be used with your ISP’s mail server:

Inside that file, you’re going to enter your information in this format: SMTP-server-exactly-as-entered-in-main.cf username:password. Two examples:

Generate a lookup table for Postfix to retrieve the passwords:

Restart postfix:

Remove the clear-text file containing your password:

Nagios Is Installed!

You’ve completed all the functionality steps for the server! Walk through the web pages and check for any issues. I followed all of these same steps through and ended up with a fully functional system. If you’re having troubles, check that all of the prerequisite components installed successfully. If you have issues in one browser, try another.

You don’t have any sensors set up yet, so your displays will be very dull. We’ll rectify that in a bit. First, we need to talk about the Windows agent NSClient++.

Installing NSClient++

If you didn’t download NSClient++ before, do so now. NSClient++ has multiple deployment options. For your very first, I highly recommend one of the MSI installs. If you’ve got many systems, you might opt to grab a ZIP distribution as well. You can then mass-push pre-defined configurations via Robocopy, login scripts, or other means.

The installation screens:

  1. I didn’t show the initial screen. The first that you care about asks you to select the Monitoring Tool. Choose Generic.
    nagoncent_nsc1
  2. Next, choose your installation type. I ordinarily pick Custom so that I can deselect the op5 options, but any will work.
    nagoncent_nsc2
  3. Pay at least some attention on this screen. Everything here will be written to the NSClient++.ini file, so you can change it all later. These are the appropriate options to use with Nagios, but I’ll discuss each after the list.
    nagoncent_nsc3
  4. Finish the installation.

The configuration items that I instructed to choose:

  • Allowed hosts: This field is required. Any source IP not in the list will be rejected by NSClient++. You can use ranges (ex: 192.168.0/24).
  • Password: check_nt uses this password (-s switch). check_nrpe does not care. By default, Nagios has a single check_nt command item that you call from other sensors. If you wish to prevent password-sharing, you’ll need to duplicate the check_nt command for each separate password.
  • Common check plugins: These are built-in plugins that you can use with NSClient++. I don’t do much with them, but you might.
  • Enable nsclient server (check_nt): You will almost certainly use several check_nt sensors.
  • Enable NRPE server (check_nrpe): My Hyper-V test scripts depend upon NRPE.
  • Insecure legacy mode (required by old check_nrpe). Since we aren’t configuring certificates, this setting is required by the current check_nrpe as well.
  • Enable NSCA client: I’m not using this client, so I didn’t enable it.
  • Enable Web server: I just configure by text file, so I didn’t enable this, either.

Configuring NSClient++ to Work with PowerShell

You’ll need to modify NSClient++ to work with PowerShell. The installer doesn’t do that.

The ini file can be found at C:Program FilesNSClient++nsclient.ini. If you installed the 32-bit NSClient++ on 64-bit Windows, look in C:Program Files (x86). The ini file is quite a mess. The following is my nsclient.ini file, with all of the fluff stripped away and the necessary lines added for PowerShell to function. I’ve highlighted what you must add:

The lines afterward show how I set up the commands and parameters for my customized scripts. The script bodies themselves are not included in this article (subscriber’s area, remember?).

Change your file to include the necessary lines and save the file. At an elevated command prompt, run:

It’s a normal Windows service with the name “nscp”, so you can also use ‘services.msc’, sc, or the PowerShell Stop-Service, Start-Service, and Restart-Service commands.

After the above, run netstat -aon | findstr LISTENING. Verify that there is a line item for 5666 (check_nrpe) and a line for 12489 (check_nt).

If this host is not configured to run unsigned PowerShell commands, run this at an elevated PowerShell prompt:

Much has been written about the execution policy and I have nothing to add. You can do an Internet search to make your own decisions, of course. None of my scripts are signed, so you’ll need RemoteSigned or looser in order to use them.

That’s It!

Your deployment is complete! Now it’s time to learn how to manage Nagios and configure sensors.

Controlling Nagios

During Nagios sensor configuration, you’ll find that you spend a great deal of time managing the Nagios service. Nagios control from the Linux command line is very simple. You’ll soon memorize these commands. Activate them in a PuTTY session or a local console.

ALWAYS Check the Nagios Configuration

After making any changes to configuration files, verify that they are valid before attempting to apply them to the running configuration:

If there are any problems, you’ll be told what they are and where to find them in the files. As long as you don’t stop Nagios, it will continue running with the configuration that it was started with. That gives you plenty of time to fix any errors.

Restart Nagios

Restart the Nagios service (only after verifying configuration!):

Stop and Start Nagios

If you need to take Nagios offline for a while and bring it up later (or if you forgot to checkconfig and have to recover from a broken setup), these are the commands:

Verify that Nagios is Running

Usually, the ability to access the web site is a good indication of whether or not Nagios is operational. If you want to check from within the Linux environment:

This will usually fill up the screen with information, so you’ll be given the ability to scroll up and down with the arrow keys to read all of the messages. Press [Q] when you’re finished.

An Introduction to Nagios Configuration Files

From here on out, I will be using WinSCP to manipulate the Nagios configuration files on the Linux host. Use PuTTY to issue the commands to check and restart the Nagios service after configuration file changes. You do not need to restart the Apache service.

Personally, I connect using the nagios account that we created in the beginning. WinSCP remembers the last folder that it was in per user, so it’s easier for navigation and so that you never run into any file permission problems. Just make a separate entry to the host for that account:

WinSCP Nagios Site

WinSCP Nagios Site

Work your way to /usr/local/nagios/etc. This is the root configuration folder. It mostly contains information that drives how it processes other files.

Nagios Root etc

Nagios Root etc

This location contains four files. I’m not going to dive into them in great detail, but I encourage you to open them up and give their contents a look-over to familiarize yourself.

  • cgi.cfg: As it says in the text, this is the primary configuration file. I have not changed anything in it.
  • htpasswd.users: This is the file that Apache will check when loading objects. Use the instructions at the top of this article to modify it.
  • nagios.cfg: This file contains a number of configuration elements for how Nagios interacts with the system. We are going to modify the OBJECT CONFIGURATION FILE(S) portion momentarily.
  • resource.cfg: This file holds customizable macros that you create, like the ones for e-mail.

Now, open up /usr/local/nagios/etc/objects. This is where the real work is done.

Nagios Configuration Folder in WinSCP

Nagios Configuration Folder in WinSCP

The file names are for your convenience only. Nagios reads them all the same way. So, don’t get agitated if you feel like a host template definition would be better in some file other than templates.cfg; Nagios doesn’t care as long as everything is formatted properly. This is what the files generally mean:

  • commands.cfg: This contains the commands that constitute the actual checks. For instance, check_ping is defined here.
  • contacts.cfg: When Nagios needs to tell somebody something, this is where those somebodies’ information is stored. It’s also where you connect users to time periods. For example, I have my administrative account in the business hours time period because I don’t really want to be woken up in the middle of the night because my test lab is unhappy.
  • localhost.cfg: Contains checks for the Linux system that runs Nagios.
  • printer.cfg: Define printer objects and checks here.
  • switch.cfg: Physical switches and their check definitions are in this file.
  • templates.cfg: Basic definitions that other definitions can inherit from are contained within.
  • timeperiods.cfg: You probably don’t want to be notified in the middle of the night when a switch misses a single ping, but you might want to know about it during normal work hours. Define what “normal work hours” and “leave me alone” time is in this file.
  • windows.cfg: Basic definitions for Windows hosts and checks.

Poke through these and get a feel for how Nagios is configured.

Nagios Objects and Their Uses

Nagios uses a few species of objects. Getting these right is important. Use the template file to guide you. The most pertinent objects are listed below:

  • contact: A target for notifications — usually an individual.
  • host: A host is any endpoint that can be checked. A computer, a switch, a printer, and a network-enabled refrigerator all qualify as a host.
  • command: Nagios checks things by running commands. The command files live in its plugins folder. The command definitions explain to Nagios how to call those plugins.
  • service: A “service” in Nagios is anything that Nagios can check with a command, and is a much more vague term than it is in Windows. In Nagios, services belong to hosts. So, if you want to know if a switch is alive by pinging it, the switch is a “host” and the ping is a “service” that calls a “command” called check_ping. You might call these “sensors” to compare to other products.
  • host group: Multiple hosts that are logically lumped together constitute a host group. Use them to apply one service to lots of hosts at once.
  • time period: This object is fairly well-explained by its name. They’re probably best understood by looking in the timeperiods.cfg file.

Nagios Templates

I’d say that the best place to start looking at Nagios objects is in the templates file. This is a copy/paste of the Contact template:

Start with the define line that indicates what type of object this block is describing. Most importantly, it signals to Nagios which properties should exist. Within this particular block, all properties for a contact are present with specific settings for each. If you use this template with a new object, then these will be its default settings. Next, notice the register line. By setting it to 0, you make it unavailable for Nagios to use directly, which is what makes this definition a template.. Now, look at an implementation of the above template:

It is also defined as a contact. First, notice the use line. Its name matches that of the template. That means that you don’t need to provide every setting for this contact, only the ones that you want to differ from the template. It is not necessary for an object to use a template. You can fill out all details for an object. A live object cannot use another live object, though, but one template can use another.

I often make backups of my configuration files before tinkering with them. WinSCP makes this simple with the Duplicate command. I also tend to copy my live configuration files to a safe place. Even though this whole thing seems easy to understand, you will make mistakes. Some of your mistakes are going to seem very stupid in retrospect. Always, always, always run sudo service nagios checkconfig before applying any new changes!

Nagios Hosts

A host in Nagios is an endpoint. It’s an easy definition in my case because I am going to specifically talk about Hyper-V hosts. The following is a host template definition that I created for my environment:

These hosts use a template that I created:

You’ll notice that this template uses the base windows-server template, but really makes no changes. I’m not overriding much in the windows-server template, so I could have all of my hosts use that one directly. However, creating a template to set up an inheritance hierarchy now is an inexpensive step that gives me flexibility later.

Nagios Groups

Most of the singular objects, like contacts and hosts, also have a corresponding group object. You might have noticed in my Hyper-V host template that it has a hostgroups property. Every host object that uses this template will be a member of the hyper-v-servers host group. Groups have very simple definitions:

I could also have used a members property within the host group definition or a hostgroups property within my Hyper-V host definitions to accomplish the same thing. This is less typing.

Host groups are very useful. First, they get their own organization on the Host Groups menu item in the Nagios web interface:

Nagios Host Groups Display

Nagios Host Groups Display

Second, you can define services at the host group level. That’s important, because otherwise, you’d have to define services for each and every host that you want to check, even if they’re all using the same check!

Nagios Services

Don’t let the term service confuse you with the same thing in a Windows environment. In Nagios, a service has a broader, although still perfectly correct definition. Anything that we can check is a service, whether that’s a ping response, Apache returning valid information on port 80, or even the output of a customized script like I have created for Hyper-V items.

The following is a service that I have created to monitor a Windows service — the Hyper-V Virtual Machine Manager service, to be exact:

Notice my use of hostgroup_name so that I only have to create this service one time. If I were creating a service for a specific host, I would use host_name instead.

I encourage you to look at the documentation for services. You may want to change the frequency of when checks occur. You may also want to redefine how long a service can be in a trouble state before you are notified.

Useful Nagios Objects Documentation

I’ve spent a little bit of time going over the objects within Nagios, but there is already a wealth of documentation on them. You will, no doubt, want to configure Nagios items on your own. NSClient++ also has a great deal more capability than what I’ve shown. These links helped me more than anything else:

Dealing with Problems Reported by Nagios

The web display is nice, and everyone enjoys seeing a screen-full of happy green monitor reports, but that’s not why we set up Nagios installations. Things break, and we want to know before users start calling. With the configuration that you have, you’ll have the ability to start getting notifications as soon as you set yourself up as a contact with valid information. When a problem occurs, Nagios will mark it as being in a SOFT warning or critical state, then it will wait to see if the problem persists for a total of three check periods (configurable). One the third check, it will mark the service as being in a HARD warning or critical state and send a notification.

If you fix a problem quickly, or if it resolves on its own, you’ll get a Recovery e-mail to let you know that all is well again. If the problem persists, you’ll continue getting an e-mail every few minutes (configurable). If one host has many services in a critical state, or if many separate hosts have issues, you’re going to be looking at a lot of e-mails.

The following screenshot shows what a service looks like in a critical state. You can see it on the Services menu item, the Services submenu under the Problems menu, and on the (Unhandled) link that is next to it.

Nagios Critical Service

Nagios Critical Service

If you click the link for the name of the service, in this case, Service: DNS, it will take you to the following details screen:

Nagios Service Detail

Nagios Service Detail

Take some time to familiarize yourself with this screen. I’m not going to discuss every option, but they are all useful. For now, I want you to look at Acknowledge this service problem.

Acknowledging Problems in Nagios

“Acknowledging” means that you are aware that the service is down. Once acknowledged, an acknowledgement notification will be sent out, but then no further notifications until the service is recovered. Basically, you’re telling Nagios, “Yes, yes, I know it’s down, leave me alone!” Click the Acknowledge this service problem link as shown in the previous screen shot and you’ll be taken to the following screen:

Nagios Acknowledgement

Nagios Acknowledgement

You can read the Command Description for an extended explanation of what Acknowledgement does and what your options are. I tend to fill out the comments field, but it’s up to you. Upon pressing Commit, the notification message is sent out and Nagios stops alerting until the service recovers (sometimes you get one more problem notification first).

Rescheduling a Nagios Service Check

Nagios runs checks on its own clock. You might have a service that doesn’t need frequent checks, so you might set it to only be tested every hour. During testing, you certainly won’t want to wait that long to see if your check is going to work. You might also want that Recovery message to go out right away after fixing a problem. In the service detail screen as shown a couple of screen shots up, click the Re-schedule the next check of this service link:

Reschedule Nagios Service Check

Reschedule Nagios Service Check

Of course, the time in the screen shot doesn’t mean anything to you. It’s the exact moment that I clicked the link on my system. If you then click Commit, it will immediately run the check. It might still take a few moments for the results to be returned so you won’t necessarily see any differences immediately, but the check does occur on time.

Scheduling Downtime

Smaller shops might not find it important to schedule downtime. If your Hyper-V host can reboot in less than 15 minutes, then you might not even get a downtime notification using the default settings. However, Nagios will give you the ability to start providing availability reports. Wouldn’t it be nice to show your boss and/or the company owners that your host was only ever down during scheduled maintenance windows?

From the service detail screen shot earlier, you can see the Schedule downtime for this service link. I’m assuming that you’ll be more likely to want to set downtime on a host rather than an individual service. The granularity is there for you to do either (or both) as suits your needs. A host’s detail screen (not shown) has Schedule downtime for this host and Schedule downtime for all services on this host links. You can also schedule downtime for an entire host or service group. These screens all look like this:

Nagios Downtime Scheduler

Nagios Downtime Scheduler

During scheduled downtime, notifications aren’t sent. In all reports, any outages during downtime are in the Scheduled category rather than Unscheduled.

The default Nagios Core distribution does not have a way to automatically schedule recurring downtime. There are some community-supported options.

Nagios Availability Reports

You saw a link in the service details screen shot above to View Availability Report for This Service. Hosts and services have this link. There’s also an Availability menu in the Reports section on the left that allows you to build custom reports. The following is a simple host availability report:

Nagios Availability

Nagios Availability

This is only for a single day. Notice the report options in the top right.

User Management for Nagios

So far, I’ve had you use the “nagiosadmin” account. As you spread out your deployment, you’re going to also set up new contacts. If you like, you can restrict those contacts to only see their own systems.

First, add a user to the nagcmd group. This will allow them to configure Nagios’ files. Be careful! If you don’t trust someone, skip this part and handle configuration for them. Optionally, you can skip the usermod parts (adds them to a group) and give them targeted access to specific configuration files.

Due to a difference between the Windows security model and the Windows security model, there is no secure way for Apache on Linux to be able to read the system users. So, you need to create a completely separate account for the Nagios web interface:

Now, create a Nagios contact with a matching contact_name:

The hosts, services, contact groups, etc. that the “andy” account is attached to will determine what Andy sees when he logs in to the Nagios web interface.

Using Nagios to Monitor Hyper-V – The real fun stuff starts here!

You now have all the tools you need to build your Hyper-V monitoring framework with Nagios. I’ve also written a few scripts and services that will get you up and running: Required Base Scripts, Monitoring the Oldest Checkpoint Age, Monitoring Dynamically Expanding VHDX Size, and more.

If you’d like to pick up these scripts and services, please register below to get access!

[optin-monster-shortcode id=”mtj25qd6dxduqyes”]

Nagios for Hyper-V: Alert on Failed Quorum

Nagios for Hyper-V: Alert on Failed Quorum

 

The health of a Microsoft Failover Cluster’s quorum leans most heavily on the state of the nodes. If you’re already using Nagios to monitor individual node states, then you’ll find out very quickly if any of them are down. Sometimes, though, the witness goes offline. If you haven’t got a monitor on that, then you can run into other problems. For instance, you may opt to manually pause a node for maintenance. If the witness is already down, the loss combination might cause the entire cluster to go offline. This article presents a short Nagios detection script for the status of your quorum witness.

This script is useful for any cluster, not just Hyper-V clusters.

If you’re new to Nagios, then you should probably start with the How To: Monitor Hyper-V with Nagios article first. I did publish a follow-up article with a script with some base functions for a cluster, but that script is not required to use this one.

NSClient++ Configuration

These changes are to be made to the NSClient++ files on all Windows nodes that are part of the cluster to be monitored. These instructions do not include configuring NSClient++ to operate PowerShell scripts. Please refer to the aforementioned how-to article for that.


C:Program FilesNSClient++nsclient.ini

If the indicated INI section does not exist, create it. Otherwise, just add the second line to the existing section.

The NSClient++ service must be restarted after all changes to its ini file.


C:Program FilesNSClient++scriptscheck_clusterquorumwitness.ps1

Create the file with the following contents:


Nagios Configuration

These changes are to be made on the Nagios host. I recommend using WinSCP as outlined in our main Nagios and Ubuntu Server articles.


/usr/local/nagios/etc/objects/commands.cfg

The Hyper-V Host Commands section should already exist if you followed our main Nagios article. Add this command there. If you are not working with a Hyper-V system, then you can create any section heading that makes sense to you, or just insert the command wherever you like.


/usr/local/nagios/etc/objects/hypervhost.cfg

This file and section were created in the Hyper-V base scripts article. As long as it appears somewhere in one of the activated .cfg files, it will work.

This is a sample! You must use your own cluster name object! If you have multiple clusters to monitor, remember that you can place them into a Nagios hostgroup. You can then apply this service to the group rather than the individual cluster name objects. Do not assign the service to the nodes! The monitor will still work, but it’s inefficient and failures will result in many duplicate notifications.

Nagios must be restarted after these files are modified. Remember to run these separately. Do not just copy/paste! If the first command indicates a validation failure, check your work and fix the problem before restarting the Nagios service!

How to Monitor Hyper-V Using Nagios

How to Monitor Hyper-V Using Nagios

System monitoring is one of the more important things we can do for our datacenters, as it helps to spot problems in the onset phase where they can be dealt with gracefully instead of waiting for an obvious failure that requires downtime. There are a huge number of monitoring tools that work with Hyper-V on the market today. One that I’m fond of is Nagios Core. It is easily one of the most powerful systems you can get your hands on, and you just can’t beat the price (free). It does come in a commercial version which adds a lot features, mostly in the user-friendliness department. Therein lies the rub with the free version: it’s not the easiest thing in the world to set up. This blog post will walk you through setting up Nagios Core and configuring basic monitoring to monitor Hyper-V. (more…)