The ABC of Hyper-V – 7 Steps to Get Started

The ABC of Hyper-V – 7 Steps to Get Started

Authors commonly struggle with the blank, empty starting page of a new work. So, if you’ve just installed Hyper-V and don’t know what to do with that empty space, you’re in good company. Let’s take a quick tour of the steps from setup to production-ready.

1. Make a Virtual Machine

It might seem like jumping ahead, but go ahead and create a virtual machine now. It doesn’t matter what you name it. It doesn’t matter how you configure it. It won’t be ready to use and you might get something wrong — maybe a lot of somethings — but go ahead. You get three things out of this exercise:

  • It’s what we authors do when we aren’t certain how to get started with writing. Just do something. It doesn’t matter what. Just break up that empty white space.
  • You learn that none of it is permanent. Make a mistake? Oh well. Change it.
  • You have a focused goal. You know that the VM won’t function without some more work. Instead of some nebulous “get things going” problem, you have a specific virtual machine to fix up.

If you start here, then you’ll have no network for the virtual machine and you may wind up with it sitting on the C: drive. That’s OK.

If you want to know the basic steps for how to create a virtual machine in Hyper-V Manager, start with this article.

2. Install Updates and Configure the System

I try to get the dull, unpleasant things out of the way before I do anything with Hyper-V. In no set order:

3. Configure Hyper-V Networking

For Hyper-V configuration, I always start with my networking stack. You will likely spend a lot of time on this, especially if you’re still new to Hyper-V.

For a server deployment, I recommend that you start with my overview article. It will help you to conceptualize and create a diagram of your configuration design before you build anything. At the end, the article contains further useful links to how-to and in-depth articles:

For a Windows 10 deployment using a current release build, you’ll automatically get a “Default Switch”. If you connect your virtual machines to it, they’ll get an IP, DNS, and NAT functionality without any further effort on your part. You can read more about that (and some other nifty Windows 10 Hyper-V features) in Sarah Cooley’s article:

4. Prepare Storage for Virtual Machines

Storage often needs a lot of time to configure correctly as well.

First, you need to set up the basic parts of storage, such as SAN LUNs and volume. I’d like to give you a 100% thorough walk-through on it (perhaps at a later date), but I couldn’t possibly cover more than a few options. However, I’ve covered a few common methods in this article: I didn’t cover fiber channel because no two vendors are similar enough to write a good generic article. I didn’t cover Storage Spaces Direct because it didn’t exist yet and I still don’t have an S2D cluster of my own to instruct from.

Whatever you choose to use for storage, you need at least one NTFS or ReFS location to hold your VMs. I’m not even going to entertain any discussion about pass-through disks because, seriously, join this decade and stop with that nonsense already. I’m still recommending NTFS because I’m not quite sold on ReFS for Hyper-V yet, but ReFS will work. One other thing to note about ReFS, is to make sure your backup/recovery vendor supports it.

5. Configure Hyper-V Host Settings

You probably won’t want to continue using Hyper-V’s defaults for long. Storage, especially, will probably not be what you want. Let’s modify some defaults. Right-click your host in Hyper-V Manager and click Hyper-V Settings.

This window has many settings, far more than I want to cover in a quick start article. I’ll show you a few things, though.

Let’s start with the two storage tabs:

You can rehome these anywhere that you like. Note:

  • For the Virtual Hard Disks setting, all new disks created using default settings will appear directly in that folder
  • For the Virtual Machines setting, all non-disk VM files will be created in special subfolders

In my case, my host will own local and clustered VMs. I’ll set my defaults to a local folder, but one that’s not as deep as what Hyper-V starts with.

Go explore a bit. Look at the rest of the default settings. Google what they mean, if you need. If you’ll be doing Shared Nothing Live Migrations, I recommend that you enable migrations on the Live Migrations tab.

6. Fix Up Your VM’s Settings

Remember that VM that I told you to create back in step one? I hope you did that because now you get to practice working with real settings on a real virtual machine. In this step, we’ll focus on the simple, direct settings. Right-click on your virtual machine and click Settings.

If you followed right through, then the VM’s virtual network adapter can’t communicate because it has no switch connection. So, jump down to the Network Adapter tab. In the Virtual Switch setting, where it says, Not connected, change it to that switch that you created in step 3.

Again, poke through and learn about the settings for your virtual machine. You’ll have a lot more to look at than you did for the host. Take special notice of:

  • Memory. Each VM defaults to 1GB of dynamic memory. You can only change a few settings during creation. You can change many more now.
  • Processor: Each VM defaults to a single virtual CPU. You’ll probably want to bump that up to at least 2. We have a little guidance on that, but the short version: don’t stress out about it too much.
  • Automatic start and stop actions: These only work for VMs that won’t be clustered.

Check out the rest of it. Look up anything that seems interesting.

7. Practice with Advanced Activities and Settings

If you followed both step one and the storage location bit of step five, then that virtual machine might not be in the location that you desire. Not a problem at all. Right-click it and choose Move. On the relevant wizard page, select Move the virtual machine’s storage:

Proceed through the wizard. If you want more instruction, follow our article.

Not everything has a predefined setting, unfortunately. You’ll occasionally need to do some more manual work. I encourage you to look into PowerShell, if you haven’t already.

Changing a Virtual Machine’s VHDX Name

Let’s say that you decided that you didn’t like the name of the virtual machine that you created in step one. Or, that you were just fine with the name, but you didn’t like the name of its VHDX. You can change the virtual machine’s name very simply: just highlight it and press [F2] or right-click it and select Rename. Hyper-V stores virtual machine’s names as properties in their xml/vmcx files, so you don’t need to change those. If you put the VM in a specially-named folder, then you can use the instructions above to move it to a new one. The VHDX doesn’t change so easily, though.

Let’s rename a virtual machine’s virtual hard disk file:

  1. The virtual machine must be off. Sorry.
  2. On the virtual hard disk’s tab in the virtual machine’s settings, click Remove:
  3. Click Apply. That will remove the disk but leave the window open. We’ll be coming back momentarily.
  4. Use whatever method you like to rename the VHDX file.
  5. Back in the Hyper-V virtual machine’s settings, you should have been left on the controller tab for the disk that you removed, with Hard Drive selected. Click Add:
  6. Browse to the renamed file:
  7. Click OK.

Your virtual machine can now be started with its newly renamed hard disk.

Tip: If you feel brave, you can try to rename the file in the browse dialog, thereby skipping the need to drop out to the operating system in step 4. I have had mixed results with this due to permissions and other environmental factors.

Tip: If you want to perform a storage migration and rename a VHDX, you can wait to perform the storage migration until you have detached the virtual hard disk. The remaining files will transfer instantly and you won’t have a copy of the VHDX. After you have performed the storage migration, you can manually move the VHDX to its new home. If the same volume hosts the destination location, the move will occur almost instantly. From there, you can proceed with the rename and attach operations. You can save substantial amounts of time that way.

Bonus round: All of these things can be scripted.

Moving Forward

In just a few simple steps, you learned the most important things about Hyper-V. What’s next? Installing a guest operating system, of course. Treat that virtual machine like a physical machine, and you’ll figure it out in no time.

Need any Help?

If you’re experiencing serious technical difficulties you should contact the Microsoft support team but for general pointers and advice, I’d love to help you out! Write to me using the comment section below and I’ll get back to you ASAP!

How to Monitor Hyper-V Performance using PNP4Nagios

How to Monitor Hyper-V Performance using PNP4Nagios

At a high level, you need three things to run a trouble-free datacenter (even if your datacenter consists of two mini-tower systems stuffed in a closet): intelligent architecture, monitoring, and trend analysis. Intelligent architecture consists of making good purchase decisions and designing virtual machines that can appropriately handle their load. Monitoring allows you to prevent or respond quickly to emergent situations. Trend analysis helps you to determine how well your reality matches your projections and greatly assists in future architectural decisions. In this article, we’re going to focus on trend analysis. We will set up a data collection and graphing system called “PNP4Nagios” that will allow you to track anything that you can measure. It will hold that data for four years. You can display it in graphs on demand.

What You Get

I know that intro was a little heavy. So, to put it more simply, I’m giving you graphs. Want to know how much CPU that VM has been using? Trying to figure out how quickly your VMs are filling up your Cluster Shared Volumes? Curious about a VM’s memory usage? We have all of that.

Where I find it most useful: Getting rid of vendor excuses. We all have at least one of those vendors that claim that we’re not providing enough CPU or memory or disk or a combination. Now, you can visually determine the reasonableness of their demands.

First, the host and service screens in Nagios will get a new graph icon next to every host and service that track performance data. Also, hovering over one of those graph icons will show a preview of the most recent chart:


Second, clicking any of those icons will open a new tab with the performance data graph for the selected item.


Just as the Nagios pages periodically refresh, the PNP4Nagios page will update itself.

Additionally, you can do the following:

  • Click-dragging a section on a graph will cause it to zoom. If you’ve ever used the zoom feature in Performance Monitor, this is similar.
  • In the Actions bar, you can:
    • Set a custom time/date range to graph
    • Generate a PDF of the visible charts
    • Generate XML summary data
  • Create a “basket” of the graphs that you view most. The basket persists between sessions, so you can build a dashboard of your favorite charts

What You Need

Fortunately, you don’t need much to get going with PNP4Nagios.

Fiscal Cost

Let’s answer the most important question: what does it cost? PNP4Nagios does not require you to purchase anything. Their site does include a Donate button. If your organization finds PNP4Nagios useful, it would be good to throw a few dollars their way.

You’ll need an infrastructure to install PNP4Nagios on, of course. We’ll wrap that up into the later segments.


As its name implies, PNP4Nagios needs Nagios. PNP4Nagios installs alongside Nagios on the same system. We have a couple of walkthroughs for installing Nagios as a Hyper-V guest, divided by distribution.

The installation really doesn’t change much between distributions. The differences lie in how you install the prerequisites and in how you configure Apache. If you know those things about your distribution, then you should be able to use either of the two linked walkthroughs to great effect. If you’d rather see something on your exact distribution, the official Nagios project has stepped up its game on documentation. If we haven’t got instructions for your distribution, maybe they do. There are still things that I do differently, but nothing of critical importance. Also, being a Hyper-V blog, I have included special items just for monitoring Hyper-V, so definitely look at the post-installation steps of my articles.

Also, if you want to use SSL and Active Directory to secure your Nagios installation, we’ve got an article for that.

Disk Space

According to the PNP4Nagios documentation, each item that you monitor will require about 400 kilobytes once it has reached maximum data retention. That assumes that you will leave the default historical interval and retention lengths. More information can be found on the PNP4Nagios site. So, 20 systems with 12 monitors apiece will use about 96 megabytes.

PNP4Nagios itself appears to use around 7 megabytes once installed and extracted.

Downloading PNP4Nagios

PNP4Nagios is distributed on Sourceforge:

As always, I recommend that you download to a standard workstation and then transfer the files to the Nagios server. Since I operate using a Windows PC and run Nagios on a Linux system, WinSCP is my choice of transfer tool.

On my Linux systems, I create a “Download” directory in my home folder and place everything there. The install portion of my instructions will be written using the file’s location as a starting point. So, for me, I begin with cd ~/Downloads.

Installing PNP4Nagios

PNP4Nagios installs quite easily.

PNP4Nagios Prerequisites

Most of the prerequisites for PNP4Nagios automatically exist in most Linux distributions. Most of the remainder will have been satisfied when you installed Nagios. The documentation lists them:

  • Perl, at least version 5. To check your installed Perl version: perl -v
  • RRDTool: This one will not be installed automatically or during a regular Nagios build. Most distributions include it in their mainstream repositories. Install with your distribution’s package manager.
    • CentOS and most other RedHat-based distributions: sudo yum install perl-rrdtool
    • SUSE-based systems: sudo zypper install rrdtool
    • Ubuntu and most other Debian-based distributions: sudo apt install rrdtool librrds-perl
  • PHP, at least version 5. This would have been installed with Nagios. Check with: php -v
  • GD extension for PHP. You might have installed this with Nagios. Easiest way to check is to just install it; it will tell you if you’ve already got it.
    • CentOS and most other RedHat-based distributions: sudo yum install php-gd
    • SUSE-based systems: sudo zypper install php-gd
    • Ubuntu and most other Debian-based distributions: sudo apt install php-gd
  • mod_rewrite extension for Apache. This should have been installed along with Nagios. How you check depends on whether your distribution uses “apache2” or “httpd” as the name of the Apache executable:
    • CentOS and most other RedHat-based distributions: sudo httpd -M | grep rewrite
    • Ubuntu, openSUSE, and most Debian and SUSE distributions: sudo apache2ctl -M | grep rewrite
  • There will be a bit more on this in the troubleshooting section near the end of the article, but if you’re running a more current version of PHP (like 7), then you may not have the XML extension built-in. I only ran into this problem on my Ubuntu installation. I solved it with this: sudo apt install php-xml
  • openSUSE was missing a couple of PHP modules on my system: sudo zypper install php-sockets php-zlib

If you are missing anything that I did not include instructions for, you can visit one of my articles on installing Nagios. If I haven’t got one for your distribution, then you’ll need to search for instructions elsewhere.

Unpacking and Installing PNP4Nagios

As I mentioned in the download section, I place my downloaded files in ~/Downloads. I start from there (with cd ~/Downloads). Start these directions in the folder where you placed your downloaded PNP4Nagios tarball.

  1. Unpack the tarball. I wrote these directions with version 0.6.26. Modify your command as necessary (don’t forget about tab completion!): tar xzf pnp4nagios-0.6.26.tar.gz
  2. Move to the unpacked folder: cd ./pnp4nagios-0.6.26/
  3. Next, you will need to configure the installer. Most of us can just use it as-is. Some of us will need to override some things, such as the Nagios user groups. To determine if that applies to you, open /usr/local/nagios/etc/nagios.cfg. Look for the following section:

    If both nagios_user and nagios_group are “nagios”, then you don’t need to do anything special.
    Regular configuration: ./configure
    Configuration with overrides: ./configure --with-nagios-user=naguser --with-nagios-group=nagcmd .
    Other overrides are available. You can view them all with ./configure --help. One useful override would be to change the location of the emitted perfdata files to an auxiliary volume to control space usage. On my Ubuntu system, I needed to override the location of the Apache conf files: ./configure --with-httpd-conf=/etc/apache2/sites-available
  4. When configure completes, check its output. Verify that everything looks OK. Especially pay attention to “Apache Config File” — note the value because you will access it later. If anything looks off, install any missing prerequisites and/or use the appropriate configure options. You can continue running ./configure until everything suits your needs.
  5. Compile the program: make all. If you have an “oh no!” moment in which you realize that you missed something, you can still re-run ./configure and then compile again.
  6. Because we’re doing a new installation, we will have it install everything: sudo make fullinstall. Be aware that we are now using sudo. That’s because it will need to copy files into locations that your regular account won’t have access to. For an upgrade, you’d likely only want sudo make install. Please check the documentation for additional notes about upgrading. If you didn’t pay attention to the output file locations during configure, they’ll be displayed to you again.
  7. We’re going to be adding a bit of flair to our Nagios links. Enable the pop-up extension with: sudo cp ./contrib/ssi/status-header.ssi /usr/local/nagios/share/ssi/

Installation is complete. We haven’t wired it into Nagios yet, so don’t expect any fireworks.

Configure Apache Security for PNP4Nagios

If you just use the default Apache security for Nagios, then you can skip this whole section. As outlined in my previous article, I use Active Directory authentication. Really, all that you need to do is duplicate your existing security configuration to the new site. Remember how I told you to pay attention to the output of configure, specifically “Apache Config File”? That’s the file to look in.

My “fixed” file looks like this:

Only a single line needed to be changed to match my Nagios virtual directories.

Initial Verification of PNP4Nagios Installation

Before we go any further, let’s ensure that our work to this point has done what we expected.

  1. If you are using a distribution whose Apache enables and disables sites by symlinking into sites-available and you instructed PNP4Nagios to place its files there (ex: Ubuntu), enable the site: sudo a2ensite pnp4nagios.conf
  2. Restart Apache.
    1. CentOS and most other RedHat-based distributions: sudo service httpd restart
    2. Almost everyone else: sudo service apache2 restart
  3. If necessary, address any issues with Apache starting. For instance, Apache on my openSUSE box really did not like the “Order” and “Allow” directives.
  4. Once Apache starts correctly, access http://yournagiosserveraddress/pnp4nagios. For instance, my internal URL is Remember that you copied over your Nagios security configuration, so you will log in using the same credentials that you use on a normal Nagios site.
  5. Fix any problems indicated by the web page. Continue reloading the Apache server and the page as necessary until you get the green light:
  6. Remove the file that validates the installation: sudo rm /usr/local/pnp4nagios/share/install.php

Installation was painless on my CentOS and Ubuntu systems. openSUSE gave me more drama. In particular, it complained about “PHP zlib extension not available” and “PHP socket extension not available”. Very easy to fix: sudo zypper install php-sockets php-zlib. Don’t forget to restart Apache after making these changes.

Initial Configuration of Nagios for PNP4Nagios

At this point, you have PNP4Nagios mostly prepared to do its job. However, if you try to access the URL, you’ll get a message that says that it doesn’t have any data: “perfdata directory “/usr/local/pnp4nagios/var/perfdata/” is empty. Please check your Nagios config.” Nagios needs to start feeding it data.

We start by making several global changes. If you are comparing my walkthrough to the official PNP4Nagios documentation, be aware that I am guiding you to a Bulk + NPCD configuration. I’ll talk about why after the how-to.

Global Nagios Configuration File Changes

In the text editor of your choice, open /usr/local/nagios/etc/nagios.cfg. Find each of the entries that I show in the following block and change them accordingly. Some don’t need anything other than to be uncommented:


Next, open /usr/local/nagios/etc/objects/templates.cfg. At the end, you’ll find some existing commands that mention “perfdata”. After those, add the commands from the following block. If you don’t use the initial Nagios sample files, then just place these commands in any active cfg file that makes sense to you.

Configuring NPCD

The performance collection method that we’re employing involves the Nagios Perfdata C Daemon (NPCD). The default configuration will work perfectly for this walkthrough. If you need something more from it, you can edit /usr/local/pnp4nagios/etc/npcd.cfg. We just want it to run as a daemon:

Enable it to run automatically at startup.

  • Most Red Hat and SUSE based distributions: sudo chkconfig --add npcd
  • Ubuntu and most other Debian-based distributions: sudo update-rc.d npcd defaults

Configuring Hosts in Nagios for PNP4Nagios Graphing

If you made it here, you’ve successfully completed all the hard work! Now you just need to tell Nagios to start collecting performance data so that PNP4Nagios can graph it.

Note: I deviate substantially from the PNP4Nagios official documentation. If you follow those directions, you will quickly and easily set up every single host and every single service to gather data. I didn’t want that because I don’t find such a heavy hand to be particularly useful. You’ll need to do more work to exert finer control. In my opinion, that extra bit of work is worth it. I’ll explain why after the how-to.

If you followed the path of least resistance, every single host in your Nagios environment inherits from a single root source. Open /usr/local/nagios/etc/objects/templates.cfg. Find the define host object with a name of generic-host. Most likely, this is your master host object. Look at its configuration:

Now that you’ve enabled performance data processing in nagios.cfg, this means that Nagios and PNP4Nagios will now start graphing for every single host in your Nagios configuration. Sound good? Well, wait a second. What it really means is that it will graph the output of the check_command for every single host in your Nagios configuration. What is check_command in this case? Probably check_ping or check_icmp. The performance data that those output are the round-trip average and packets lost during pings from the Nagios server to the host in question. Is that really useful information? To track for four years?

I don’t really need that information. Certainly not for every host. So, I modified mine to look this:

What we have:

  • Our existing hosts are untouched. They’ll continue not recording performance data just as they always have.
  • A new, small host definition called “perf-host”. It also does not set up the recording of host performance data. However, its “action_url” setting will cause it to display a link to any graphs that belong to this host. You can use this with hosts that have graphed services but you don’t want the ping statistics tracked. To use it, you would set up/modify hosts and host templates to inherit from this template in addition to whatever host templates they already inherit from. For example: use perf-host,generic-host.
  • A new, small host definition called “perf-host-pingdata”. It works exactly like “perf-host” except that it will capture the ping data as well. The extra bit on the end of the “action_url” will cause it to draw a little preview when you mouseover the link. To use it, you will set up/modify hosts and host templates to inherit from this template in addition to whatever host templates they already inherit from. For example: use perf-host-pingdata,generic-host.

Note: When setting the inheritance:

  • perf-host or perf-host-pingdata must come before any other host templates in a use line.
  • In some instances, including a space after the comma in a use line causes Nagios to panic if the name of the host does not also have a space (ex: you are using tabs instead of spaces on the name generic_host line. Make sure that all of your use directives have no spaces after any commas and you will never have a problem. Ex: use perf-host,generic-host.

Remember to check the configuration and restart Nagios after any changes to the .cfg files:

Couldn’t You Just Set a Single Root Host for Inheritance?

An alternative to the above would be:

In this configuration, perf-host inherits directly from generic-host. You could then have all of your other systems inherit from perf-host instead of generic-host. The problem is that even in a fairly new Nagios installation, a fair number of hosts already inherit from generic-host. You’d need to determine which of those you wanted to edit and carefully consider how inheritance works. If you’re going to all of that trouble, it seems to me that maybe you should just directly edit the generic-host template and be done with it.

Truthfully, I’m only telling you what I do. Do whatever makes sense to you.

Configuring Services in Nagios for PNP4Nagios Graphing

You’ll get much more use of out service graphing than host graphing. Just as with hosts, the default configuration enables performance graphing for all services. Not all services emit performance data, and you may not want data from all services that do produce data. So, let’s fine-tune that configuration as well.

Still in /usr/local/nagios/etc/objects/templates.cfg, find the define service object with a name of generic-service. Disable performance data collection on it and add a stub service that enables performance graphing:

When you want to capture performance data from a service, prepend the new stub service to its use line. Ex: use perf-service,generic-service. The warnings from the host section about the order of items and the lack of a space after the comma in the use line transfer to the service definition.

Remember to check the configuration and restart Nagios after any changes to the .cfg files:

Example Configurations

In case the above doesn’t make sense, I’ll show you what I’m doing.

Most of the check_nt services emit performance data. I’m especially interested in CPU, disk, and memory. The uptime service also emits data, but for some reason, it doesn’t use the defined “counter” mode. Instead, it’s just a graph that steadily increases at each interval until you reboot, then it starts over again at zero. I don’t find that terribly useful, especially since Nagios has its own perfectly capable host uptime graphs. So, I first configure the “windows-server” host to show the performance action_url. Then I configure the desired default Windows services to capture performance data.

My /usr/local/nagios/etc/objects/windows.cfg:

Now, my hosts that inherit from the default Windows template have the extra action icon, but my other hosts do not:

p4n_hostswithiconsThe same story on the services page; services that track performance data have an icon, but the others do not:


Troubleshooting your PNP4Nagios Deployment

Not getting any data? First of all, be patient, especially when you’re just getting started. I have shown you how to set up the bulk mode with NPCD which means that data captures and graphing are delayed. I’ll explain why later, but for now, just be aware that it will take some time before you get anything at all.

If it’s been some time, say, 15 minutes, and you’re still not getting any data. Go to and download the verify_pnp_config file. Transfer it to your Nagios host. I just plop it into my Downloads folder as usual. Navigate to the folder where you placed yours, then run:

That should give you the clues that you need to fix most any problems.

I did have one leftover problem, but only my Ubuntu system where I had updated to PHP 7. The verify script passed everything, but trying to load any PNP4Nagios page gave me this error: “Call to undefined function simplexml_load_file()”. I only needed to install the PHP XML package to fix that: sudo apt install php-xml. I didn’t look up the equivalent on the other distributions.

Plugin Output for Performance Graphing

To determine if a plugin can be graphed, you could just look at its documentation. Otherwise, you’ll need to manually execute it from /usr/local/nagios/libexec. For instance, we’ll just use the first one that shows up on an Ubuntu system, check_apt:


See the pipe character (|) there after the available updates report? Then the jumble of characters after that? That’s all in the standard format for Nagios performance charting. That format is:

  1. A pipe character after the standard Nagios service monitoring result.
  2. A human-readable label. If the label includes any special characters, the entire label should be enclosed in single quotes.
  3. An equal sign (=)
  4. The reported value.
  5. Optionally, a unit of measure.
  6. A semi-colon, optionally followed by a value for the warning level. If the warning level is visible on the produced chart, it will be indicated by a horizontal yellow line.
  7. A semi-colon, optionally followed by a value for the critical level. If the warning level is visible on the produced chart, it will be indicated by a horizontal red line.
  8. A semicolon, optionally followed by the minimum value for the chart’s y-axis. Must be the same unit of measure as the value in #4. If not specified, PNP4Nagios will automatically set the minimum value. If this value would make the current value invisible, PNP4Nagios will set its own minimum.
  9. A semicolon, optionally followed by the maximum value for the chart’s y-axis. Must be the same unit of measure as the value in #4. If not specified, PNP4Nagios will automatically set the maximum value. If this value would make the current value invisible, PNP4Nagios will set its own maximum.

This format is defined by Nagios and PNP4Nagios conforms to it. You can read more about the format at:

My plugins did not originally emit any performance data. I have been working on that and should hopefully have all of that work completed before you read this article.

My PNP4Nagios Configuration Philosophy

I had several decision points when setting up my system. You may choose to diverge as it meets your needs. I’ll use this section to explain why I made the choices that I did.

Why “Bulk with NPCD” Mode?

Initially, I tried to set up PNP4Nagios in “synchronous” mode. That would cause Nagios to instantly call on PNP4Nagios to generate performance data immediately after every check’s results were returned. I chose that initially because it seemed like the path of least resistance.

It didn’t work for me. I’m betting that I did something wrong. But, I didn’t get my problem sorted out. I found a lot more information on the NPCD mode. So, I switched. Then I researched the differences. I feel like I made the correct choice.

You can read up on the available modes yourself:

In synchronous mode, Nagios can’t do anything while PNP4Nagios processes the return information. That’s because it all occurs in the same thread; we call that behavior “blocking”. According to the PNP4Nagios documentation, that method “will work very good up to about 1,000 services in a 5-minute interval”. I assume that’s CPU-driven, but I don’t know. I also don’t know how to quantify or qualify “will work very good”. I also don’t know what sort of environments any of my readers are using.

Bulk mode moves the processing of data from per-return-of-results to gathering results for a while and then processing them all at once. The documentation says that testing showed that 2,000 services were processed in .06 seconds. That’s easier to translate to real-world systems, although I still don’t know the overall conditions that generated that benchmark.

When we add NPCD onto bulk mode, then we don’t block Nagios at all. Nagios still does the bulk gathering, but NPCD processes the data, not Nagios. I chose this method as it means that as long as your Nagios system is multi-core and not already overloaded, you should not encounter any meaningful interruption to your Nagios service by adding PNP4Nagios. It should also work well with most installation sizes. For really big Nagios/PNP4Nagios installations (also not qualified or quantified), you can follow their instructions on configuring “Gearman Mode”.

One drawback to this method: Your “4 Hour” charts will frequently show an empty space at the right of their charts. That’s because they will be drawn in-between collection/processing periods. All of the data will be filled in after a few minutes. You just may not have instant gratification.

Why Not Just Allow Every Host and Service to be Monitored?

The default configuration of PNP4Nagios results in every single host and every single service being enabled for monitoring. From an “ease-of-configuration” standpoint, that’s tempting. Once you’ve set the globals, you literally don’t have to do anything else.

However, we are also integrating directly with Nagios’ generated HTML pages. Whereas PNP4Nagios can determine that a service doesn’t have performance data because Nagios won’t have generated anything, the front-end just has an instruction to add a linked icon to every single service. So, if you just globally enable it, then you’ll get a lot of links that don’t work.

If you’re the only person using your environment, maybe that’s OK. But, if you share the environment, then you’ll start getting calls wanting to you to “fix” all those broken links. It won’t take long before you’re spending more time explaining (and re-explaining) that not all of the links have anything to show.

Why Not Just Change the Inheritance Tree?

If you want, you could have your performance-enabled hosts and services inherit from the generic-host/generic-service templates, then have later templates, hosts, and services inherit from those. If that works for you, then take that approach.

I chose to employ multiple inheritance as a way of overriding the default templates because it seemed like less effort to me. When I went to modify the services, I simply copied “perf-service,” to the clipboard and then selectively pasted it into the use line of every service that I wanted. It worked easier for me than a selective find-replace operation or manual replacement. It also seems to me that it would be easier to revert that decision if I make a mistake somewhere.

I can envision very solid arguments for handling this differently. I won’t argue. I just think that this approach was best for my situation.

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

6 Hardware Tweaks that will Skyrocket your Hyper-V Performance

Few Hyper-V topics burn up the Internet quite like “performance”. No matter how fast it goes, we always want it to go faster. If you search even a little, you’ll find many articles with long lists of ways to improve Hyper-V’s performance. The less focused articles start with general Windows performance tips and sprinkle some Hyper-V-flavored spice on them. I want to use this article to tighten the focus down on Hyper-V hardware settings only. That means it won’t be as long as some others; I’ll just think of that as wasting less of your time.

1. Upgrade your system

I guess this goes without saying but every performance article I write will always include this point front-and-center. Each piece of hardware has its own maximum speed. Where that speed barrier lies in comparison to other hardware in the same category almost always correlates directly with cost. You cannot tweak a go-cart to outrun a Corvette without spending at least as much money as just buying a Corvette — and that’s without considering the time element. If you bought slow hardware, then you will have a slow Hyper-V environment.

Fortunately, this point has a corollary: don’t panic. Production systems, especially server-class systems, almost never experience demand levels that compare to the stress tests that admins put on new equipment. If typical load levels were that high, it’s doubtful that virtualization would have caught on so quickly. We use virtualization for so many reasons nowadays, we forget that “cost savings through better utilization of under-loaded server equipment” was one of the primary drivers of early virtualization adoption.

2. BIOS Settings for Hyper-V Performance

Don’t neglect your BIOS! It contains some of the most important settings for Hyper-V.

  • C States. Disable C States! Few things impact Hyper-V performance quite as strongly as C States! Names and locations will vary, so look in areas related to Processor/CPU, Performance, and Power Management. If you can’t find anything that specifically says C States, then look for settings that disable/minimize power management. C1E is usually the worst offender for Live Migration problems, although other modes can cause issues.
  • Virtualization support: A number of features have popped up through the years, but most BIOS manufacturers have since consolidated them all into a global “Virtualization Support” switch, or something similar. I don’t believe that current versions of Hyper-V will even run if these settings aren’t enabled. Here are some individual component names, for those special BIOSs that break them out:
    • Virtual Machine Extensions (VMX)
    • AMD-V — AMD CPUs/mainboards. Be aware that Hyper-V can’t (yet?) run nested virtual machines on AMD chips
    • VT-x, or sometimes just VT — Intel CPUs/mainboards. Required for nested virtualization with Hyper-V in Windows 10/Server 2016
  • Data Execution Prevention: DEP means less for performance and more for security. It’s also a requirement. But, we’re talking about your BIOS settings and you’re in your BIOS, so we’ll talk about it. Just make sure that it’s on. If you don’t see it under the DEP name, look for:
    • No Execute (NX) — AMD CPUs/mainboards
    • Execute Disable (XD) — Intel CPUs/mainboards
  • Second Level Address Translation: I’m including this for completion. It’s been many years since any system was built new without SLAT support. If you have one, following every point in this post to the letter still won’t make that system fast. Starting with Windows 8 and Server 2016, you cannot use Hyper-V without SLAT support. Names that you will see SLAT under:
    • Nested Page Tables (NPT)/Rapid Virtualization Indexing (RVI) — AMD CPUs/mainboards
    • Extended Page Tables (EPT) — Intel CPUs/mainboards
  • Disable power management. This goes hand-in-hand with C States. Just turn off power management altogether. Get your energy savings via consolidation. You can also buy lower wattage systems.
  • Use Hyperthreading. I’ve seen a tiny handful of claims that Hyperthreading causes problems on Hyper-V. I’ve heard more convincing stories about space aliens. I’ve personally seen the same number of space aliens as I’ve seen Hyperthreading problems with Hyper-V (that would be zero). If you’ve legitimately encountered a problem that was fixed by disabling Hyperthreading AND you can prove that it wasn’t a bad CPU, that’s great! Please let me know. But remember, you’re still in a minority of a minority of a minority. The rest of us will run Hyperthreading.
  • Disable SCSI BIOSs. Unless you are booting your host from a SAN, kill the BIOSs on your SCSI adapters. It doesn’t do anything good or bad for a running Hyper-V host but slows down physical boot times.
  • Disable BIOS-set VLAN IDs on physical NICs. Some network adapters support VLAN tagging through boot-up interfaces. If you then bind a Hyper-V virtual switch to one of those adapters, you could encounter all sorts of network nastiness.

3. Storage Settings for Hyper-V Performance

I wish the IT world would learn to cope with the fact that rotating hard disks do not move data very quickly. If you just can’t cope with that, buy a gigantic lot of them and make big RAID 10 arrays. Or, you could get a stack of SSDs. Don’t get six or so spinning disks and get sad that they “only” move data at a few hundred megabytes per second. That’s how the tech works.

Performance tips for storage:

  • Learn to live with the fact that storage is slow.
  • Remember that speed tests do not reflect real world load and that file copy does not test anything except permissions.
  • Learn to live with Hyper-V’s I/O scheduler. If you want a computer system to have 100% access to storage bandwidth, start by checking your assumptions. Just because a single file copy doesn’t go as fast as you think it should, does not mean that the system won’t perform its production role adequately. If you’re certain that a system must have total and complete storage speed, then do not virtualize it. The only way that a VM can get that level of speed is by stealing I/O from other guests.
  • Enable read caches
  • Carefully consider the potential risks of write caching. If acceptable, enable write caches. If your internal disks, DAS, SAN, or NAS has a battery backup system that can guarantee clean cache flushes on a power outage, write caching is generally safe. Internal batteries that report their status and/or automatically disable caching are best. UPS-backed systems are sometimes OK, but they are not foolproof.
  • Prefer few arrays with many disks over many arrays with few disks.
  • Unless you’re going to store VMs on a remote system, do not create an array just for Hyper-V. By that, I mean that if you’ve got six internal bays, do not create a RAID-1 for Hyper-V and a RAID-x for the virtual machines. That’s a Microsoft SQL Server 2000 design. This is 2017 and you’re building a Hyper-V server. Use all the bays in one big array.
  • Do not architect your storage to make the hypervisor/management operating system go fast. I can’t believe how many times I read on forums that Hyper-V needs lots of disk speed. After boot-up, it needs almost nothing. The hypervisor remains resident in memory. Unless you’re doing something questionable in the management OS, it won’t even page to disk very often. Architect storage speed in favor of your virtual machines.
  • Set your fibre channel SANs to use very tight WWN masks. Live Migration requires a hand off from one system to another, and the looser the mask, the longer that takes. With 2016 the guests shouldn’t crash, but the hand-off might be noticeable.
  • Keep iSCSI/SMB networks clear of other traffic. I see a lot of recommendations to put each and every iSCSI NIC on a system into its own VLAN and/or layer-3 network. I’m on the fence about that; a network storm in one iSCSI network would probably justify it. However, keeping those networks quiet would go a long way on its own. For clustered systems, multi-channel SMB needs each adapter to be on a unique layer 3 network (according to the docs; from what I can tell, it works even with same-net configurations).
  • If using gigabit, try to physically separate iSCSI/SMB from your virtual switch. Meaning, don’t make that traffic endure the overhead of virtual switch processing, if you can help it.
  • Round robin MPIO might not be the best, although it’s the most recommended. If you have one of the aforementioned network storms, Round Robin will negate some of the benefits of VLAN/layer 3 segregation. I like least queue depth, myself.
  • MPIO and SMB multi-channel are much faster and more efficient than the best teaming.
  • If you must run MPIO or SMB traffic across a team, create multiple virtual or logical NICs. It will give the teaming implementation more opportunities to create balanced streams.
  • Use jumbo frames for iSCSI/SMB connections if everything supports it (host adapters, switches, and back-end storage). You’ll improve the header-to-payload bit ratio by a meaningful amount.
  • Enable RSS on SMB-carrying adapters. If you have RDMA-capable adapters, absolutely enable that.
  • Use dynamically-expanding VHDX, but not dynamically-expanding VHD. I still see people recommending fixed VHDX for operating system VHDXs, which is just absurd. Fixed VHDX is good for high-volume databases, but mostly because they’ll probably expand to use all the space anyway. Dynamic VHDX enjoys higher average write speeds because it completely ignores zero writes. No defined pattern has yet emerged that declares a winner on read rates, but people who say that fixed always wins are making demonstrably false assumptions.
  • Do not use pass-through disks. The performance is sometimes a little bit better, but sometimes it’s worse, and it almost always causes some other problem elsewhere. The trade-off is not worth it. Just add one spindle to your array to make up for any perceived speed deficiencies. If you insist on using pass-through for performance reasons, then I want to see the performance traces of production traffic that prove it.
  • Don’t let fragmentation keep you up at night. Fragmentation is a problem for single-spindle desktops/laptops, “admins” that never should have been promoted above first-line help desk, and salespeople selling defragmentation software. If you’re here to disagree, you better have a URL to performance traces that I can independently verify before you even bother entering a comment. I have plenty of Hyper-V systems of my own on storage ranging from 3-spindle up to >100 spindle, and the first time I even feel compelled to run a defrag (much less get anything out of it) I’ll be happy to issue a mea culpa. For those keeping track, we’re at 6 years and counting.

4. Memory Settings for Hyper-V Performance

There isn’t much that you can do for memory. Buy what you can afford and, for the most part, don’t worry about it.

  • Buy and install your memory chips optimally. Multi-channel memory is somewhat faster than single-channel. Your hardware manufacturer will be able to help you with that.
  • Don’t over-allocate memory to guests. Just because your file server had 16GB before you virtualized it does not mean that it has any use for 16GB.
  • Use Dynamic Memory unless you have a system that expressly forbids it. It’s better to stretch your memory dollar farther than wring your hands about whether or not Dynamic Memory is a good thing. Until directly proven otherwise for a given server, it’s a good thing.
  • Don’t worry so much about NUMA. I’ve read volumes and volumes on it. Even spent a lot of time configuring it on a high-load system. Wrote some about it. Never got any of that time back. I’ve had some interesting conversations with people that really did need to tune NUMA. They constitute… oh, I’d say about .1% of all the conversations that I’ve ever had about Hyper-V. The rest of you should leave NUMA enabled at defaults and walk away.

5. Network Settings for Hyper-V Performance

Networking configuration can make a real difference to Hyper-V performance.

  • Learn to live with the fact that gigabit networking is “slow” and that 10GbE networking often has barriers to reaching 10Gbps for a single test. Most networking demands don’t even bog down gigabit. It’s just not that big of a deal for most people.
  • Learn to live with the fact that a) your four-spindle disk array can’t fill up even one 10GbE pipe, much less the pair that you assigned to iSCSI and that b) it’s not Hyper-V’s fault. I know this doesn’t apply to everyone, but wow, do I see lots of complaints about how Hyper-V can’t magically pull or push bits across a network faster than a disk subsystem can read and/or write them.
  • Disable VMQ on gigabit adapters. I think some manufacturers are finally coming around to the fact that they have a problem. Too late, though. The purpose of VMQ is to redistribute inbound network processing for individual virtual NICs away from CPU 0, core 0 to the other cores in the system. Current-model CPUs are fast enough that they can handle many gigabit adapters.
  • If you are using a Hyper-V virtual switch on a network team and you’ve disabled VMQ on the physical NICs, disable it on the team adapter as well. I’ve been saying that since shortly after 2012 came out and people are finally discovering that I’m right, so, yay? Anyway, do it.
  • Don’t worry so much about vRSS. RSS is like VMQ, only for non-VM traffic. vRSS, then, is the projection of VMQ down into the virtual machine. Basically, with traditional VMQ, the VMs’ inbound traffic is separated across pNICs in the management OS, but then each guest still processes its own data on vCPU 0. vRSS splits traffic processing across vCPUs inside the guest once it gets there. The “drawback” is that distributing processing and then redistributing processing causes more processing. So, the load is nicely distributed, but it’s also higher than it would otherwise be. The upshot: almost no one will care. Set it or don’t set it, it’s probably not going to impact you a lot either way. If you’re new to all of this, then you’ll find an “RSS” setting on the network adapter inside the guest. If that’s on in the guest (off by default) and VMQ is on and functioning in the host, then you have vRSS. woohoo.
  • Don’t blame Hyper-V for your networking ills. I mention this in the context of performance because your time has value. I’m constantly called upon to troubleshoot Hyper-V “networking problems” because someone is sharing MACs or IPs or trying to get traffic from the dark side of the moon over a Cat-3 cable with three broken strands. Hyper-V is also almost always blamed by people that just don’t have a functional understanding of TCP/IP. More wasted time that I’ll never get back.
  • Use one virtual switch. Multiple virtual switches cause processing overhead without providing returns. This is a guideline, not a rule, but you need to be prepared to provide an unflinching, sure-footed defense for every virtual switch in a host after the first.
  • Don’t mix gigabit with 10 gigabit in a team. Teaming will not automatically select 10GbE over the gigabit. 10GbE is so much faster than gigabit that it’s best to just kill gigabit and converge on the 10GbE.
  • 10x gigabit cards do not equal 1x 10GbE card. I’m all for only using 10GbE when you can justify it with usage statistics, but gigabit just cannot compete.

6. Maintenance Best Practices

Don’t neglect your systems once they’re deployed!

  • Take a performance baseline when you first deploy a system and save it.
  • Take and save another performance baseline when your system reaches a normative load level (basically, once you’ve reached its expected number of VMs).
  • Keep drivers reasonably up-to-date. Verify that settings aren’t lost after each update.
  • Monitor hardware health. The Windows Event Log often provides early warning symptoms, if you have nothing else.


Further reading

If you carry out all (or as many as possible) of the above hardware adjustments you will witness a considerable jump in your hyper-v performance. That I can guarantee. However, for those who don’t have the time, patience or prepared to make the necessary investment in some cases, Altaro has developed an e-book just for you. Find out more about it here: Supercharging Hyper-V Performance for the time-strapped admin.


Hyper-V and the Small Business: 9 Tips for Host Provisioning

Hyper-V and the Small Business: 9 Tips for Host Provisioning

The category of questions that I most commonly field are related to host design. Provisioning is a difficult operation for small businesses; they don’t do it often enough to obtain the same level of experience as a large business and they don’t have the finances to absorb either an under-provision or an over-provision. If you don’t build your host large enough, you’ll be buying a new one while the existing still has life in it. If you buy too much, you’ll be wasting money that could have been used elsewhere. Unfortunately, there’s no magic formula for provisioning, but you can employ a number of techniques to guide you to a right-sized build.

1. Do Not Provision Blindly

Do not buy a pre-packaged build, do not have someone on a forum recommend their favorite configuration, and do not simply buy something that looks good. Vendors are only interested in profit margins, forum participants only know their own situations, and no one can adequately architect a Hyper-V host in a void.

2. Have a Budget in Mind

Everyone hates it when the vendor asks, “How much do you want/have to spend?” I completely understand why you don’t want to answer that question at all, and I agree with the sentiment. We all know that if you say that you have $5,000 to spend, your bill will somehow be $5,027 dollars. Unless you have a history with the vendor in question, you don’t know if the vendor is truly sizing against your budget or if they’re finding the highest-margin solution that more or less coincides with what you said you were willing to pay. That said, even if you don’t give the answer, you must know the answer. That answer must truly be an amount that you’re willing to spend; don’t say that you’ll spend $5,000 if what you’re truly able to spend is $3,000. I worked for a vendor of solid repute that earned their reputation, so I can tell you from direct experience that it’s highly unlikely that you’ll ever be sold a system that is meaningfully smaller than what you can afford even if your reseller isn’t trying to oversell. Every system that I ever architected for a small business made some compromises to fit within budget. The more money they could spend, the fewer compromises were necessary.

3. Storage and Memory are Your Biggest Concerns

Part of the reason that virtualization works at all is because modern CPU capability greatly outmatches modern CPU demand. I am one of the many people that can remember days when conserving CPU cycles was important, but I can clearly see that those days are long gone. Do not try to buy a system that will establish a 1-to-1 ratio of physical CPUs to virtual CPUs. If you’re a small business that will only have a few virtual machines, it would be difficult to purchase any modern server-class hardware that doesn’t have enough CPU power. For you, the generation of the CPU is much more important than the core count or clock speed.

Five years ago, I would (and did) say that memory was your largest worry. That’s no longer true, especially for the small business. DDR3 is substantially cheaper than DDR2, and, with only a few notable exceptions, the average system’s demand on memory has not increased as quickly as the cost has decreased. For the notable exceptions (Exchange and SharePoint), the small business can likely get better pricing by choosing a cloud-based or non-Microsoft solution as opposed to hosting these products on-premises. Even if you choose to host them in-house, a typical server-class system with 32 GB of RAM can hold an 8 GB SharePoint guest, an 8 GB Exchange guest, and still have a good 14 GB of memory left over for other guests (assuming 2 GB for the management operating system). Even a tight budget for server hardware should be able to accommodate 32 GB of RAM in a host.

Storage is where you need to spend some time applying thought. For small businesses that won’t be clustering (rationale on my previous post), these are my recommendations:

  • Internal storage provides the best return for your dollar.
  • For the same dollar amount, prefer many small and fast disks over a few large and slow disks.
  • A single large array containing all of your disks is superior to multiple arrays of subsets.
  • Hardware array controllers are worth the money. Tip: if the array controller that you’re considering offers a battery-backed version, it is hardware-based. The battery is worth the extra expense.

Storage sizing is important, but I am intentionally avoiding going any further about it in this article because I want it to be applicable for as many small businesses as possible. There are two takeaways that I want you to glean from this point:

  • CPU is a problem that mostly solves itself and memory shouldn’t take long to figure out. Storage is the biggest question for you.
  • The storage equation is particular to each situation. There is no one-size-fits-all solution. There isn’t a one-sized-fits-most solution. There isn’t a typical, or a standard, or a usual, or a regular solution that is guaranteed to be appropriate for you. Vendors that tell you otherwise are either very well-versed in a particular vertical market that you’re scoped to and will have the credentials and references to prove it or they’re trying to get the most money out of you for a minimum amount of time invested on their part.

Networking is typically the last thing a small business should be worried about. As with storage sizing I can’t be specific enough to cover everyone that I’d like this post to be relevant to, but it’s safe to say that 2 to 6 gigabit Ethernet connections per host are sufficient.

4. Do not be Goaded or Bullied into 10 Gigabit Ethernet

I won’t lie, 10 GbE is really nice. It’s impressive to see it in operation. But, the rest of the truth is that it’s unnecessary in most small businesses, and in lots of medium businesses too. You can grow to a few thousand endpoints before it even starts to become necessary as an interswitch backbone.

A huge part of the reasoning is simple economics:

  • A basic business-class 20-port gigabit switch can be had for around $200 USD. You can reasonably expect to acquire gigabit network adapters for $50 or less per port.
  • A basic 12-port 10GbE switch costs at least $1,500 USD. Adapters will set you back at least $250 per port.

When you’re connecting five server-class hosts, $1,500 for a switch and $500 apiece for networking doesn’t seem like much. When you’re only buying one host for $5,000 or less, the ratio isn’t nearly as sensible. That price is just for the budget equipment. Since 10GbE adapters can move network data faster than modern CPUs can process it, offloading and VMQ technologies are quite important to get the most out of 10GbE networking. That means that you’re going to want something better than just the bare minimum.

What might even be more relevant than price is the fact that most people don’t use as much network bandwidth as they think they do. The most common tests do not even resemble typical network utilization, which can fool administrators into thinking that they don’t have enough. If you need to verify your usage, I’ve written an article that can help you do just that with MRTG. This leads into a very important point.

5. You Need to Know What You Need

Unless you’re building a host for a brand-new business, you’ve got an existing installation to work with. Set up Performance Monitor or any monitoring tool of your choice and find out what your systems are using. Measure CPU, disk, memory, and networking. Do not even start trying to decide what hardware to buy until you have some solid long-term metrics to look at. I’m surprised at how many messages I get asking me to recommend a hardware build that have little or no information about what the environment is. I’m guessing that the questioners are just as surprised when I respond, “I don’t know.” It doesn’t take a great deal of work to find out what’s going on. Do that work first.

6. Build for the Length of the Warranty

Collecting data on your existing systems only tells you what you need to know to get through the first day. You’re probably going to need more over time. How much more depends on your environment. Some businesses have reached equilibrium and don’t grow much. Others are just kicking things off and will triple in size in a few months. Since those truly new environments are rare, I’m going to aim this next bit at that gigantic majority that is building for the established institutions. Decide how much warranty you’re willing to buy for the new host and use that as your measuring stick for the rest of it. How you proceed depends upon growth projections:

  • If system needs won’t grow much (for example, 5-10% annually), then build the system with a long warranty period in mind. If the business has been experiencing a 5% average annual growth rate and is currently using 300 GB of data, a viable option is to purchase a system with 500 GB of usage storage with a 5-year warranty.
  • If system needs will grow rapidly, you have two solid options:
    • Buy an inexpensive system with a short warranty (1-3 years). Ensure that it’s understood that this system is not expected to live long. If decision-makers appear to be agreeing without understanding, you’re better off getting a bigger system.
    • Buy a system that’s a little larger with a longer warranty (5 years). Plan a definite growth point at which you will scale out to a second host. Scaling out can become more than twice as expensive as the original, especially when clustering is a consideration, so do not take this decision lightly.

Most hardware vendors will allow warranty extensions which gives you some incentive to oversize. If you make a projection for a five-year system and it isn’t at capacity at the end of those five years, extending the warranty helps to maximize the initial investment.

In case it’s not obvious, future projections are much easier to perform when you have a solid idea of the environment’s history. There’s more than one reason that I make such a big deal out of performance monitoring.

7. Think Outside the One Box

Small businesses typically only need one physical host. There isn’t any line at which you cross over into “medium” business and magically need a second host. There isn’t any concrete relationship between business size and the need for infrastructure capacity. Just as I preach against the dangers of jumping into a cluster needlessly, I am just as fervent about not having less than what is adequate. Scaling out can undoubtedly be expensive, but when it’s time, it’s time.

Clustering isn’t necessarily the next step up from a single host. Stagger your host purchases across years so that you have an older system that handles lighter loads and a newer system that takes on the heavier tasks. What’s especially nice about having two Hyper-V hosts is that you can have two domain controllers on separate equipment. Even though I firmly stand behind my belief that most small businesses operate perfectly well with a single domain controller, I am just as certain that anyone who can run two or more domain controllers on separate hosts without hardship will benefit from the practice.

8. Maybe Warranties are Overrated

I’ve been in the industry long enough to see many hardware failures, some of legendary quality. That doesn’t change the fact that physical failures are very nearly a statistical anomaly. I work in systems administration; my clients and workers from other departments never call me just to be social. Anyone in my line of work deals with exponentially more failures than anyone outside my line of work. So, while I am probably always going to counsel you to spend extra so that you can get new or something that is new enough that you can still acquire a manufacturer’s warranty on it, I can also acknowledge that there are alternatives when the budget simply won’t allow for it.

In my own home, most of the equipment that we use is not new. As a technophile, I have more computers than people, and that’s before you start counting the devices that don’t have keyboards or the units that are only used for my blogging. I rarely buy anything new. I am unquestionably a fan of refurbished and cast-off systems. It’s all very simple to understand: I want to own more than I can technically afford to own, and this practice satisfies both my desire for tech and my need for frugality. Is that any way to run a business? Well…

Cons of Refurbished and Used Hardware

One the one hand, no, this is not a good idea. If any of this fails, I have to either repair it, live without it, or replace it. If you don’t have the skills for the first, the capacity for the second, or the finances for the third, that leaves your business in the lurch. If you’d have to make that choice, then no, don’t do this. Another concern is that if you’re doing this to be cheap, a lot of cheap equipment doesn’t meet the criteria to be listed on and might be more trouble than the savings is worth. And of course, even if it’s good enough for today’s version, it might not work with tomorrow’s version.

For another thing, I’ve seen a lot of really cheap business owners use equipment that they had to repair all the time and that was so inefficient that it impacted worker productivity. That sort of thing is a net loss. Avoid these conditions, even if it means spending more money. Remember what I said earlier about compromises? Sometimes the only viable compromise is to spend more money on better hardware.

If you go the route of having hardware that doesn’t carry a warranty, you need to be prepared to replace it at all times. Warranty repairs are commonly no longer than next-business-day in this era. Buying replacement hardware could have days or even weeks of lead time. Having replacement hardware on hand can cost more than just buying new with warranty.

Pros of Refurbished and Used Hardware

On the other hand, I spent less to acquire many of these things than their original owners did on their warranties, and, with the law of averages, most of my refurbished equipment has never failed. I quite literally have more for less. Something else to remember is that a lot of refurbished hardware is very new. Sometimes they’re just returns that can no longer be sold as “new”. You can often get original manufacturer warranties on them. The only downside to purchasing that sort of hardware is that you don’t get to pick exactly what you want. For the kind of savings that can be had, so what?

In case you’re curious, all of the places that I’ve worked pushed a very hard line of only selling new equipment. “Used” and “refurbished” carry a very strong negative connotation that no one I worked for wanted to be attached to. However, I didn’t work for anyone that would turn away a client that was using used or refurbished equipment that they acquired independently. I’ve encountered plenty of it in the field. It didn’t fail any more often than new equipment did. I’ll say that I do feel more comfortable about “refurbished” than “used”. I also know what it’s like to be looking at a tight budget and needing to make tough decisions.

I will say that I would prefer to avoid used hardware for a Hyper-V host. I understand that it can be enticing for the very small business budget so I will stop short of declaring this a rule. It’s reasonable to expect used hardware to be unreliable and short-lived. Used hardware will consume more of your time. Operating from the assumption that your time has great value, I encourage you to consider used hardware as a last resort.

9. Architect for Backup

I expect point #8 to stir a bit of controversy. I fully expect many people to disagree with any notion of non-new equipment, especially those that depend on margins from sales of new hardware. I don’t mind the fight; until someone comes up with 100% failure-free new hardware, there will never be a truly airtight case for only buying new.

If you want to give yourself a guaranteed peace of mind, backup is where you need to focus. I may not know the statistics around failures of new versus used or refurbished equipment, but I know that all hardware has a chance of breaking. What doesn’t break due to defect or negligence can be destroyed by malice or happenstance, so you can never rely too much upon the quality of a purchase.

What this means is that when you’re deciding how many hard disks to buy and the size of the network switch to plug the host in to, you also need to be thinking about where you’re going to copy the data every night. External hard disks are great, as long as they’re big enough. Offsite service providers are fine, as long as you know that your Internet bandwidth can handle it. If you don’t know this in advance, you run the risk of needing to sacrifice something in your backup rotations. I have yet to see any sacrifice in this aspect that was worth it.

8 Hyper-V Best Practices for Small Businesses

8 Hyper-V Best Practices for Small Businesses

I often lament the way that small businesses are treated by the majority of technical writers and a significant number of software companies. You know the ones I’m talking about — those who think of “small business” as a nebulous concept that means “something under a few thousand employees”. They take what they know and just scale it down to some arbitrary minimum that in no way represents a small business.

Fortunately, I’ve spent most of my career working with and for truly small businesses, right down to the loners building up companies out of their basements. I know the struggles of insufficient time and budget shortfalls and lack of expertise. I also know what works and what doesn’t. What I intend to share with you is the distillation of all that experience into the strategy that I would follow if I ever struck out on my own to build a startup. This is only the first of my articles targeting the small business environment. I’ve got plenty for you.

To begin, I want to introduce a set of 8 Hyper-V best practices for small businesses that are important to consider.

1. Backup is Your Highest Priority

In my second full-time IT position, one of my very first responsibilities was assisting small business customers across the country with backup problems. I held that position for five years. During that period, I accumulated stories that would give you nightmares. I’ve had to say the words, “I’m sorry, but your data is unrecoverable,” more than once. The most disheartening thing I see is forum posts from small businesses wanting to know what the “best free backup” program is. “Best” is not where they’re drawing the line; it’s drawn at “free”. Don’t get me wrong; free is good — but only if it meets every single one of your needs. Skimping on backup can very easily mean placing your entire business at risk, unless your business is able to withstand a complete data loss event. Even if it can withstand that today, are you not planning to grow?

I expect you to ask, “How can I know if a particular backup solution meets my needs?” That’s a wonderful question, and I’ll answer it even if you didn’t ask. Here’s how you know:

  • Every last piece of data that you care about must be backed up for at least as long as you’ll ever care about it. If you’re in a particular industry, such as finance, make sure that you understand the regulations around data retention that apply to you. Regulations aren’t always the deciding factor; I once had to recover a 7-year-old e-mail to aid a plaintiff’s case. Some free backup applications have a limit on retention. When evaluating an application on these grounds, remember that “retention” only applies to the age of the backup itself; if the data that was backed up last night is 5 years old, then the backup age is less than a day.
  • Any backup data in the same geographical location as the source data is forfeit to the whims of chance. You need some way to move it offsite, preferably out of the range of any natural disaster that might destroy your main location. For small businesses, there is a limit to the effectiveness of very long-range offsite storage. If your town is entirely destroyed by a flood and all of your customers are in that town, then how far do you really need to move your data? If your business is an insurance agency, you’ll likely answer that question differently than would someone who owns a landscaping outfit. Of course, your own business is likely insured, so you’ll need to protect any data that would be needed to file any claims in the event of a business-ending disaster. The takeaway of this point is that the backup solution that you use must allow you to take your data offsite and you must follow through.
  • The backed-up data must be recoverable. I often hear backup referred to as “an insurance policy”. That analogy is awful. People go out of their way to avoid using insurance policies because filing a claim often leads to premium increases, and in the worst cases, policy cancellations. Practice data restoration often. The primary benefit is that it proves that your backup data is good, your backup application is good, and that your skills are good. I have worked with several applications through the years that take absolutely lovely backups that no one in the world can restore data from. I can’t imagine anything that would be less useless. You won’t know if yours is one of those without trying it out. The second benefit is that practice makes perfect. You don’t want to be learning how to restore data in the middle of a crisis.
  • You can’t be the only one that knows. For the most part, this isn’t entirely about the backup application. Working from the assumption that your business is important to more people than you, such as those that your income supports, make sure that you leave some form of documentation that would allow someone else to recover your data in the event that you are… unavailable. This affects your backup application choices in that you don’t want some obscure application that no one else can figure out.
  • Encryption matters. You only want to be mentioned in news headlines for good things. “Small Business Corp Loses Customer Records on Unencrypted Data Tapes” does not qualify.

I certainly understand that cost is important, but it is secondary at most. Give up something else before you skimp on backup.

2. Backup is Your Redundancy

Despite the word “backup” in the title, backup is not the target of this point. “Redundancy” is. I know at least a few people that seem to derive great pleasure from terrorizing forum posters over a “single point of failure”. Ignore them.

With Hyper-V being the focal technology of this post, we’ll start with that. In the “all businesses are the same only different sizes” philosophy, you’ll have at least two clustered hosts connected to fully redundant shared storage with every component connected to a minimum of two switches. That sounds good, and believe me, it is. But it’s more than twice as expensive as a single host with internal storage connected to a single switch. “Single point of failure!” scream the people that don’t have to deal with your budgetary issues. Well, I took a course last year from an instructor who continually spoke of the “single point of success.” “Gives it a more positive tone, don’t you think?” he asked when I inquired about it. He’s right. Fewer than 2% of computer problems are caused by hardware failure and fewer parts means less complexity and fewer things to troubleshoot and maintain. If you have a good backup and you know how to use it, take comfort in the knowledge that the odds are heavily in your favor.

3. Quality Matters

A core requirement to making the previous point work is choosing quality at every point. If you save $500 on a shiny new mid-tower to run Hyper-V but have to spend three days in the forums and several hours on the phone or e-mail with technical support because Hyper-V won’t start, that is three days and several hours that you weren’t out drumming up business. If you have trouble getting it to work, you’re probably going to have trouble keeping it working.

Hardware components fail sometimes, and sometimes they’re shipped out new in non-working condition. It’s rare, but unavoidable. What you need to concern yourself with is going with the most likely equipment that will succeed. Start on Ask around. Perform an Internet search for “<vendorname> Hyper-V problems”.

Oh, and quality isn’t important just for hardware.

4. Get Help

If I were to start a new business, I probably would not hire any technical assistance to start. I’m fairly certain I know what I’m doing. However, if I started putting in a noticeable amount of my time doing technical support, I wouldn’t hesitate to pick up the phone. Neither should you. Even if you’re knowledgeable today, running a business is going to consume a lot of time. You’re eventually going to have to choose between maintaining your top-tier technical status and running a business. At some point, you’re going to have to pass the burden on.

Of course, as a small business, you’re not hiring any full-time engineers. But, there are many small technology services companies out there that cater to the small business. Find them.

In your search, get help with that too. Point #3 applies here just as strongly as it does to hardware. It’s a sad fact that many technology service companies are terrible with technology and owe their entire existence to customers that have even less knowledge. A few pointers:

  • If you’re not technically inclined, attend local small business expos and networking sessions where you can meet other local small business owners. They love to share their horror stories and most will gladly make recommendations.
  • If you are technically inclined, you can spend a bit of time researching technical subjects, at least enough to keep abreast of the current state of the art. Sometimes, you’ll just find people that will flat-out tell you how to spot frauds. For instance, if you have service provider insisting that you can’t oversubscribe CPU, must turn off Dynamic Memory, and can never use dynamically expanding virtual hard disks, that provider is uneducated and/or trying to sell you hardware that you don’t need. I have one tip for those times when you discover that your provider is clueless and/or unethical: do not attempt to argue with them or help them get better. Fire them immediately and move on. In almost twenty years of being in this industry, I have never seen any evil or willfully ignorant technology provider come to the light.
  • Good engineers do not work for cheap. When I left the provider industry to move to internal IT, I was charging a higher rate than anyone else that I knew of. However, I charged individual customers noticeably fewer hours than my competitors, so I needed that higher rate to stay afloat. Do you want the firm that charges $60 per hour and will need a permanent office of their own in your building or the firm that charges $200 per hour and will be done in two days?

5. Most “Best Practices” were Written by People Who Do Not Believe that You Exist

As far as the technical world is concerned, you might as well name your company “The Tooth Fairy” or “Santa Claus”. Two or more domain controllers on separate physical units? Who is going to pay for all of that? Multiple Internet providers to one building? Not a chance. Redundant everything? Maybe if you win the lottery — assuming you can ever afford a ticket.

Most “Best Practices” lists are built on solid reasoning and they are important to understand. With solid understanding, you can also discern where your small business can deviate. Hopefully, you’ll have a good provider or assistant (#4) that can guide you in the proper direction. If your provider is blindly quoting and adhering to a list they printed off of someone else’s website, that’s a bad sign. The word “Why” is the most important tool in your kit in these situations. A provider’s explanation is less important than they way that they answer. If they don’t seem certain, it’s because they aren’t. If they have a canned, practiced answer that in no way involves or references your situation, they do not understand what they are doing.

6. Use Hyper-V in a Way that Saves Money

I used to work with a fairly standard build for small business: one server installation that did everything. Best practice? Absolutely not. But, when clients are working with annual technology budgets that are smaller than a first-class ticket to the nearest airport, you make do. Nowadays, choices are simpler with things like Office 365 and Exchange Online. The recent pricing shake-up that Microsoft enacted with its OneDrive product likely, and deservedly, has small business owners nervous about signing on with any Microsoft cloud service. However, Small Business Server is no longer an option (and frankly, was never a wonderful option) and the cost of acquiring and maintaining some Microsoft servers on-premises is prohibitively expensive.

Considerations for hosting on-premises:

  • If you can’t get reliable, affordable Internet access, skip cloud providers. They don’t care if you can’t connect.
  • If you’re going to have any servers on-premises at all, Active Directory is a must-have. Only go workgroup if you’ll only be using personal computers.
  • SQL Server Express is far cheaper to host in-house than almost any cloud provider can match. However, treat this like #1; don’t buy less than you need and be ready for growth.
  • Non-Microsoft software may be cheaper than cloud solutions, especially if you’d have to rent a virtual machine or dedicated host.

Whatever servers you choose to bring in-house, your goal should be to fit them all in Hyper-V. When I had to do one-host installations, I always had to worry about compatibility problems with all of those applications, including Active Directory, sitting on one unit. You don’t have to do that. You may have to purchase more Windows Server licenses, but you do not need to purchase more hosts.

Oh, and even though I already mentioned it: oversubscribe CPU, turn on Dynamic Memory (where appropriate), and use dynamically expanding disks (where appropriate).

The reason I wrote this point is that many people are going to tell you that the purpose of Hyper-V is so you that can buy more hosts and switches and storage to “avoid single point of failure!”. Virtualization for you is a money-saving move. Backup is your redundancy.

7. Small Is OK

I would never buy a Hyper-V host with fewer than 16 cores total or less than 256 gigabytes of memory and any host that I purchase will absolutely connect to my EMC SAN. I also have a campus agreement for ridiculously cheap Windows Server licenses, a history of volume purchases that nets me extremely favorable hardware pricing, and a budget for IT alone that is larger than many towns. Many other writers are either in the same place as I am or have never worked directly with small businesses and just do not understand what you’re going through.

Let’s set some realistic upper and lower boundaries:

  • I don’t see many people over-recommending CPU, so you’re probably not in any real danger. Most small business’ Windows Server instances just need to have 2 vCPU available for contention issues. Few will put any meaningful load on them. I would consider keeping the ceiling at 8 total physical cores in each physical host just so you don’t run afoul of per-core licensing coming in 2016. I’d love to try to give you a hard number, but unfortunately, that’s not possible. However, had I ever had the sort of hardware and hypervisors available to me then that I do now, I don’t think any of my 20-or-fewer user clients would have needed anything larger than six total cores in a host.
  • Memory is also easy to oversize. I would say that a well-proven average for a Windows Server instance running basic services is 2 GB, including the management operating system. Of course, the more roles and applications that you pack into an instance, the more it is likely to need. The big point is, don’t overdo it and make any provider reason out what they’re recommending within a few gigabytes. Don’t try to drop in 128 GB just because everyone else is doing it. If you’re only going to run two virtual machines, 16GB of physical memory is probably more than you’ll need but gives you breathing room at an affordable price point.
  • Disk is where you can save.
    • Internal storage is just fine.
    • RAID-10 is not a requirement, despite what many claim. RAID-5 and -6 are both fine, but spend extra for a hardware controller. If you’re not certain whether or not the RAID controller advertised for a system is hardware, the give-away is whether or not it can be battery-supported. You do want battery support.
    • For local storage, I would skip Storage Spaces for at least one more version iteration. Battery protection isn’t even an option, for one thing. For another, its redundancy features are handled on your CPUs — the same ones I told you not to oversize. In my opinion, Storage Spaces is still better when it’s used as remote, shared storage. I would back that up by pointing out that Microsoft is one of those vendors that are stymied by the small business.
    • Rather than buying a few very large disks, try for a few more smaller disks. The performance and resiliency is superior, especially when using RAID-5 or -6.
  • 10 GbE is an absurd expenditure for a small business. Ditto RDMA, SR-IOV and other high-end networking features. If you are using internal storage, two to four 1GbE connections are perfect. Add 2 if you’re using iSCSI.
  • VDI’s benefits are in features. It is not a money-saver. The typical small (and medium) business should avoid VDI.
  • OEM pricing usually works out better than volume pricing for a small business. If you have a decent reseller, they will help you verify that. They should not just give you a pricing sheet and leave it to you to figure out, nor should they give you a 5-minute synopsis of whatever the current state of volume licensing is and expect you to decide on your own. Those are signs that you need to find another provider.
  • Your backup solution is part of your server solution, not an afterthought. Price out the software and the supporting hardware at the same time.
  • Buy an uninterruptible power supply for your host. I have seen an uncountable number of instances in which thousands of dollars of server-class equipment went to the recyclers and days of transactions were lost due to refusal to purchase a $250 device. If you’re banking on insurance coverage, they also realize the value of a UPS and that will be one of the first questions they ask when you file your claim.

8. This Version is OK

When it comes to technology, don’t try to keep up with the Joneses. I liked Microsoft a lot better when they kept their operating system releases fairly far apart and issued periodic Service Packs. Those days are gone. However, you don’t need to feel pressured to jump to each new version as soon as it comes out. The people that call you on each new release date are not looking out for your best interests. All they can see is “licensing margins” and “engagement fees”.

Upgrade when:

  • The version that you have is no longer sufficient for your needs in a way that is addressed by a newer release.
  • You are purchasing a replacement host.
  • Your current version is nearing the end of its support lifecycle.

What’s Coming

What I’d like to do in future articles is expand #7 from a generic list into some practical build ideas to help guide the perplexed. If you’re a small business or provide service to small businesses, I’d like to hear your stories, suggestions, and questions.

19 Best Practices for a Hyper-V Cluster

19 Best Practices for a Hyper-V Cluster

It’s not difficult to find all sorts of lists and discussions of best practices for Hyper-V. There’s the 42 best practices for Balanced Hyper-V systems article that I wrote. There is the TechNet blog article by Roger Osborne. Best practices lists are a bit tougher to find for failover clustering, but there are a few if you look. What I’m going to do in this article is focus in on the overlapping portion of the Hyper-V/failover clustering Venn diagram. Items that apply only to Hyper-V will be trimmed away and items that would not apply in a failover cluster of roles other than Hyper-V will not be included. What’s also not going to be included is a lot of how-to, otherwise this document would grow to an unmanageable length. I’ll provide links where I can.

As with any proper list of best practices, these are not rules. These are simply solid and tested practices that produce predictable, reproducible results that are known to result in lower cost and effort in deployment and/or maintenance. It’s OK to stray from them as long as you have a defensible reason.

1. Document Everything

Failover clustering is always a precarious configuration of a messy business. There are a lot of moving parts that are neither independent nor inter-dependent. Most things that go wrong will do so without world-stopping consequences up until the point where their cumulative effect is catastrophic. The ease of working with cluster resources masks a complex underpinning that could easily come apart. It is imperative that you keep a log of things that change for your own record and for anyone that will ever need to assist or replace you. Track:

  • Virtual machine adds, removes, and changes
  • Node adds, removes, and changes
  • Storage adds, removes, and changes
  • Updates to firmware and patches
  • All non-standard settings, such as possible/preferred owner restrictions
  • Errors, crashes, and other problems
  • Performance trends
  • Variations from expected performance trends

For human actions taken upon a cluster, what is very important is to track why something was done. Patches are obvious. Why was a firmware update applied? Why was a node added?

For cluster problems, keep track of them. Don’t simply look at a log full of warnings and say, “Oh, I understand them all” and then clear it. You’d be amazed at how useful these histories are. Even if they are benign, what happens if you get a new IT Manager who says, “You’ve known about these errors and you’ve just been deleting them?” Even if you have a good reason, the question is inherently designed to make you look like a fool whether or not that’s the questioner’s intent. You want to answer, “I am keeping a record and continuing to monitor the situation.” Furthermore, if those messages turn out to be not as benign as you thought, you have that record. You can authoritatively say, “This began on…”

2. Automate

Even in a small cluster, automation is vital. For starters, it clears up your time from tedious chores. All of those things that you know that you should be doing but aren’t doing become a whole lot less painful. Start with Cluster-Aware Updating. If you want to get a record of all of those warnings and errors that we were talking about in point #1, how does this suit you?:

Be aware that the above script can take quite a while to process, especially if you have a lot of problems and/or haven’t cleared your logs in a while, but it is very thorough. Be aware that this should be run against every node. The various Hyper-V logs will be different on each system. While many of the failover clustering logs should be the same, there will be discrepancies. For no more difficult than this script is to run and for no more space than it will require to store, it’s better to just have them all.

One of the things that I got in the habit of doing is periodically changing the ownership of all Cluster Shared Volumes. While the problem may have since been fixed, I had issues under 2008 R2 with nodes silently losing connectivity to iSCSI targets that they didn’t have live virtual machines on, which later caused problems when an owning node rebooted or crashed and the remaining node could not take over. Having a background script occasionally shuffling ownership served the dual purpose of keeping CSV connectivity alive and allowing the cluster to log errors without bringing the CSV offline. The command is Move-ClusterSharedVolume.

Always be on the lookout for new things to automate. There’s a pretty good rule of thumb for that: if it’s not fun and you have to do it once, you can pretty much guarantee that you’ll have to do it again, so it’s better to figure out how to get the computer to do it for you.

3. Monitor Everything

If monitoring a Hyper-V host is important, monitoring a cluster of Hyper-V hosts is doubly so. Not only does a host need to be able to handle its own load, it also needs to be able to handle at least some of the load of at least one other node at any given moment. That means that you need to be keeping an eye on overall cluster resource utilization. In the event that a node fails, you certainly want to know about that immediately. By leveraging some of the advanced capabilities of Performance Monitor, you can, with what I find to be a significant amount of effort, have a cluster that monitors itself and can use e-mail to notify you of issues. If your cellular provider has an e-mail-to-text gateway or you have access to an SMS conversion provider, you can even get hosts to text or page you so that you get urgent notifications quickly. However, if your resources are important enough that you built a failover cluster to protect them, they’re also important enough for you to acquire a proper monitoring solution. This solution should involve at least one system that is not otherwise related to the cluster so that complete outages are also caught.

Even just taking a few minutes here and there to click through the various sections of Failover Cluster Manager can be beneficial. You might not even know that a CSV is in Redirected Access Mode if you don’t look.

4. Use the Provided Auditing Tools

A quick and easy way to locate obvious problems is to let the system look for them.

It’s almost imperative to use the Cluster Validation Wizard. For one thing, Microsoft will not obligate itself to provide support for any cluster that has not passed validation. For another, it can uncover a lot of problems that you might otherwise not ever be aware of. Remember your validation report must be kept up-to-date. Get a new one if you add or remove any nodes or storage. Technically, you should also update it if you update firmware or drivers as well, although that’s substantially less critical. Reports are saved in C:\Windows\Cluster\Reports on every node for easy viewing later. This wizard does cause Cluster Shared Volumes to be briefly taken offline, so only run this tool on existing clusters during scheduled maintenance windows.

Don’t forget about the Best Practices Analyzer. The analyzer for Hyper-V is now rolled into Server Manager. If you combine all the hosts for a given cluster into one Server Manager display, you can run the BPA against them all at once. If you’re accustomed to writing off Server Manager because it was not so useful in previous editions, consider giving it another look. At additional hosts using the second option on the first page:

Adding other hosts in Server Manager

Adding other hosts in Server Manager

I don’t want to spend a lot of time on the Best Practices Analyzer in this post, but I will say that the quality of its output is more questionable than a lot of other tools. I’m not saying that it isn’t useful, but I wouldn’t trust everything that it says.

5. Avoid Geographically Dispersed Hyper-V Clusters

Geographically-dispersed clusters, also known as “stretched” clusters or “geo-clusters”, are a wonderful thing for a lot of roles, and can really amp up your “cool” factor and buzzword-compliance, but Hyper-V is really not the best application. If you have an application that requires real-time geographical resilience, then it is incumbent upon the application to provide the technology to enable that level of high availability. The primary limiting factor is storage; Hyper-V is simply not designed around the idea of real-time replicated storage, even using third-party solutions. It can be made to work, but doing so typically requires a great deal of architectural and maintenance overhead.

If an application does not provide the necessary features and you can afford some downtime in the event of a site being lost or disconnected, Hyper-V Replica is the preferred choice. Build the application to operate in a single site and replicate it to another cluster in another location. If the primary site is lost, you can quickly fail over to the secondary site. A few moments of data will be lost and there will be some downtime, but the amount of effort to build and maintain such a deployment is a fraction of what it would take to operate a comparable geo-cluster.

Of course, “never say never” wins the day. If you must build such a solution, remember to leverage features such as Possible Owners. Take care with your quorum configuration, and make doubly certain that absolutely everything is documented.

6. Strive for Node Homogeneity

Microsoft does not strictly require that all of the nodes in a cluster have the same hardware, but you should make it your goal. Documentation is much easier when you can say, “5 nodes of this” rather than maintain a lot of different build forms with the required differential notation.

This is a bigger deal for Hyper-V than for most other clustered roles. There aren’t any others that I’m aware of that have any noticeable issues when cluster nodes have different CPUs, beyond the expected performance differentials. Hyper-V, on the other hand, requires virtual machines to be placed in CPU compatibility mode or they will not Live Migrate. They won’t even Quick Migrate unless turned off. The end effects of CPU compatibility mode are not documented in an easy-to-understand fashion (you can take a look), but it is absolutely certain that the full capabilities of your CPU are not made available to any virtual machine in compatibility mode. The effective impact depends entirely upon what CPU instructions are expected by the applications on the guests in your Hyper-V cluster, and I don’t know that any software manufacturer publishes that information.

Realistically, I don’t expect that setting CPU compatibility mode for most typical server applications will be an issue. However, better safe than sorry.

7. Use Computer-Based Group Policies with Caution

Configurations that don’t present any issues on stand-alone systems can cause problems when those same systems are clustered. A notorious right that causes Live Migration problems when tampered with is “Create Symbolic Links”. It’s best to either avoid computer-scoped policies or only use those that are known and tested to work with Hyper-V clustering. For example, the GPO templates that ship with the Microsoft Baseline Security Analyzer will not cause problems with only one potential exception: they disable the iSCSI service. Otherwise, use them as-is.

8. Develop Patterns and Practices that Prevent Migration Failures

Using dissimilar CPUs and bad GPOs aren’t the only way that a migration might fail. Accidentally creating a virtual machine with resources placed on local storage is one potential problem. A practice that will avoid this is to always change the default locations on every host to shared storage. This helps control for human errors and for the (now fixed) bug in Failover Cluster Manager where it sometimes caused some components to be placed in the default storage location when it created virtual machines. A related pattern is to discourage the use of Failover Cluster Manager to create virtual machines.

A few other migration-breakers:

  • Always use consistent naming for virtual switches. A single-character difference in a virtual switch name will prevent a Live/Quick Migration.
  • Avoid using multiple virtual switches in hosts to reduce the possibility that a switch naming mismatch will occur.
  • Do not use private or internal virtual switches on clustered hosts. A virtual machine cannot Live Migrate if it is connected to a switch of either type, even if the same switch appears on the target node.
  • Use ISO images from shared storage. If you must use an image hosted locally, remember to eject it as soon as possible.

9. Use Multiple Shared Storage Locations

The saying, “Don’t place all your eggs in one basket,” comes to mind. Even if you only have a single shared storage location, break it up into smaller partitions and/or LUNs. The benefits are:

  • Logical separation of resources; examples: general use storage, SQL server storage, file server storage.
  • Performance. If your storage device doesn’t use tiering, you can take advantage of two basic facts about spinning disks: performance is better for data closer to the outer edge of the spindle (where the first data is written) and data is more quickly accessed when it is physically close on the platter. While I don’t worry very much about either of these facts and have found all the FUD around them to be much ado about nothing, there’s no harm in leveraging them when you can. Following the logical separation bullet point, I would place SQL servers in the first LUNs or partitions created on new disks and general purpose file servers in the last LUNs. This limits how much fragmentation will affect either and keeps the more performance-sensitive SQL data in the optimal region of disk.
  • An escape hatch. In the week leading up to writing this article, I encountered some very strange problems with the SMB 3 share that I host my virtual machines on. I tried for hours to figure out what it was and finally decided to give up and recreate it from scratch. Even though I only have the one system that hosts storage, it had a traditional iSCSI Clustered Shared Volume on it in addition to the SMB 3 share. I used Storage Live Migration to move all the data to the CSV, deleted and recreated the share, and used Storage Live Migration to move all the virtual machines back.
  • Defragmentation. As far as I’m concerned, disk fragmentation is far and away the most overblown topic of the modern computing era. But, if you’re worried about it, using Storage Live Migration to move all of your VMs to a temporary location and then moving them all back will result in a completely defragmented storage environment with zero downtime.

10. Use at Least Two Distinct Cluster Networks, Preferably More

As you know, a cluster will define networks based on unique TCP/IP subnets. What some people don’t know is that it will create distinct TCP/IP streams for inter-node communication based on this fact. So, some people will build a team of network adapters and only use one or two cluster networks to handle everything: management, cluster communications such as heartbeating, and Live Migration. Then they’ll be surprised to discover network contention problems, such as heartbeat failures during Live Migrations. This is because, without the creation of distinct cluster networks, it might attempt to co-opt the same network stream for multiple functions. All that traffic would be bound to only one or two adapters while the others stay nearly empty. Set up multiple networks to avoid this problem. If you’re teaming, create multiple virtual network adapters on the virtual switch for the hosts to use.

11. Minimize the Number of Network Adapter Teams

In the 2008 R2 days, people would make several teams of 1GbE adapters: one for management traffic, one for cluster traffic, and one for Live Migration. Unfortunately, people are still doing that in 2012+. Please, for your own sake, stop. Converge all of these into a single team if you can. It will result in a much more efficient and resilient utilization of hardware.

12. Do Not Starve Virtual Machines to Benefit Live Migration or Anything Else

It’s really disheartening to see 35 virtual machines crammed on to a few gigabit cards and a pair of 10 GbE cards reserved for Live Migration, or worse, iSCSI. Live Migration can wait and iSCSI won’t use that kind of bandwidth often enough to be worth it. If you have two 10 GbE cards and four built-in 1GbE ports, use the gigabit ports for iSCSI and/or Live Migration. Better yet, just let them sit empty and use convergence to put everything on the 10GbE adapters. Everything will be fine. Remember that your Hyper-V cluster is supposed to be providing services to virtual machines; forcing the virtual machines to yield resources to cluster services is a backwards design.

13. Only Configure QoS After You’ve Determined that You Need QoS

I’ve seen so many people endlessly wringing their hands over “correctly” configured QoS prior to deployment that I’ve long since lost count. Just stop. Set your virtual switches to use the “Weight” mode for QoS and leave everything at defaults. If you want, set critical things to have a few guaranteed percentage points, but stop after that. Get your deployment going. Monitor the situation. If something is starved out, find out what’s starving it and address that because it’s probably a run-away condition. If you can’t address it because it’s normal, consider scaling out. If you can’t scale out, then configure QoS. You’ll have a much better idea of what the QoS settings should be when you actually have a problem to address than you ever will when nothing is wrong. The same goes for Storage QoS.

14. Always be Mindful of Licensing

I’m not going to rehash the work we’ve already done on this topic. You should know by now that every Windows instance in a virtual machine must have access to a fully licensed virtualization right on every physical host that it ever operates on. This means that, in a cluster environment, you’re going to be buying lots and lots of licenses. That part, we’ve already explained into the ground. What some people don’t consider is that this can affect the way that you scale a cluster. While Microsoft is taking steps in Windows Server 2016 licensing to increase the cost of scaling up on a single host, it’s still going to be cheaper for most people in most situations than scaling out, especially for those people that are already looking at Datacenter licenses. In either licensing scheme, keep in mind that most people are not actually driving their CPUs nearly as hard as they could. Memory is likely to be the bigger bottleneck to same-host scaling than CPU.

15. Keep in Mind that Resources Exhaust Differently in a Cluster

When everything is OK, virtual machines in a cluster really behave exactly like virtual machines scattered across separate stand-alone hosts. But, when something fails or you try to migrate something, things can get weird. A migration might fail for a virtual machine because it is currently using more RAM than is available on the target host. But, if its host failed and the destination system is recovering from a crash condition, it might succeed. That’s because the virtual machine’s Startup Dynamic RAM setting is likely to be lower than its running RAM.

Of course, that’s only talking about a single virtual machine. What about the much more probable scenario that a host has crashed and several VMs need to move? That’s when all of those priority settings come into play. If you have more than two nodes in your cluster, the cluster service will do its best job of getting everyone online wherever they fit. But, you need to have decided in advance which virtual machines were most important. If you haven’t, then the creep of virtual machine sprawl might have left you in a situation of needing to make some hard decisions to turn off healthy virtual machines in order to bring up vital crashed machines. Manual intervention defeats a lot of the purpose of clustering.

Shared storage behaves differently than local storage in more than one way. Sure, it’s remote, and that has its own issues. But, the shared part is what I’m focusing on. Remember how I said it wasn’t a good idea to put your iSCSI on 10 GbE if it would take away from virtual machines? This is why. Sure, maybe, just maybe, your storage really can supply 20 Gbps of iSCSI data. But can it simultaneously supply 60 Gbps for 3 Hyper-V hosts that each have two 10GbE NICs using MPIO? If it can, do you really only have only three hosts accessing it? (if you answered “yes” and “yes”, most admins in the world are jealous) The point here is that the size of the pipe that a host has into your storage sets the upper limit on how that host can control — and potentially dominate — that storage system’s resources. Remember that it’s really tough to get a completely clear image of the way that storage performance is being consumed in a cluster when compared to a standalone system.

16. Do Not Cluster Virtual Machines that Don’t Need It

Don’t make your virtualized domain controllers into highly available virtual machines. The powers of Hyper-V and failover clustering pale in comparison to the native resiliency features of Active Directory. If you have any other application with similar powers, don’t make it HA via Hyper-V clustering either. Remember that you have to make licenses available everywhere that a VM might run anyway. If it provides better resiliency to just create a separate VM and configure application high availability, choose that route every time.

Both groups require the same licensing, but the second group is more resilient

Both groups require the same licensing, but the second group is more resilient

17. Don’t Put Non-HA Machines on Cluster Shared Volumes

Storage is a confusing thing and it seems like there are no right or wrong answers sometimes, but you do need to avoid placing any non-HA VMs on a CSV. Microsoft doesn’t support it and it will cause VMM to panic. Things can get a little “off” when the node that owns the CSV isn’t the node that owns the non-HA virtual machine placed on it. It’s also confusing when you’re looking at a CSV and see more virtual machine files than you can account for in Failover Cluster Manager. Use internal space, singularly attached LUNs, or SMB 3 storage for non-HA VMs.

18. Minimize the Use of non-HA VMs in a Cluster

Any time you use a non-HA VM on a clustered host, make sure that it’s documented and preferably noted right on the VM. This helps eliminate confusion later. Some helpful new admin might think you overlooked something and HA the VM for you, even though you had a really good reason not to do so. I’m not saying “never”; I do it myself in my system for my virtualized domain controllers. But, if I had extra stand-alone Hyper-V hosts, that’s where I’d put my non-HA VMs.

19. Ask Somebody

If you don’t know, start with your favorite search engine. The wheels for both Hyper-V and failover clustering are very round, and there are lots of people that have written out lots of material on both subjects.

If that fails (and please, only if that fails), ask somebody. My favorite place is the TechNet forums, but that’s only one of many. However, on behalf of myself and everyone else who writes publicly, please don’t try to contact an individual directly. There’s a reason that companies charge for support; it’s time-consuming and we’re all busy people. Try a community first.


Hyper-V & PowerShell: How to Retrieve Available Host Memory

Hyper-V & PowerShell: How to Retrieve Available Host Memory

One of the things I commonly lament over is the poor state of the management tools available for Hyper-V (from Microsoft; I’m pointedly not talking about third party solutions). One issue I see a lot of is that there isn’t a quick way, when looking at the Hyper-V-specific tools, to know how much free memory a host has. People then have to resort to other tools like Task Manager to determine this. These methods are usually effective, but imperfect. Sometimes, you are unable to match up what those tools display against what happens in Hyper-V.

I could write out a long and complicated script that would display some fairly detailed information on memory usage in your systems, and someday I might do that. However, as this article is part of the ongoing Hyper-V and PowerShell series, the primary focus of this article will be to help you get to the information that you need as quickly as possible with steps that you have a chance of remembering. The secondary lessons in this article are to introduce you to custom objects and basic error handling in PowerShell.

The first stop on the PowerShell memory-exploration train will be WMI. Many people get a quick look at WMI and run away screaming. I can’t blame them. WMI is one of those things that can actually get scarier as you learn more about it. However, there’s no denying the raw power that you can harness with it. In this case, you can breathe easy; it’s fairly trivial to get sufficient information about memory from WMI. Even better, there isn’t a language out there that can interact with WMI as easily as PowerShell. To find out how much memory you’ve got left in your system, you only need to ask the Win32_OperatingSystem class:

Wow! Looks like a whole lot of typing and things to remember, doesn’t it? Well, it only looks that way because I have a policy of showing you the entirety of my PowerShell cmdlets because it makes things a lot easier to decode, comprehend, tear apart, and tinker with later. You don’t really need all of that. This will do just as well:

You don’t even have to capitalize Win32_OperatingSystem, if you don’t want to.

You can run it for multiple hosts simultaneously using implicit remoting:

How accurate is it? Well, that’s a two-part answer. I’ll tackle the easy one first. These two screenshots were taken from the same system at about the same time:

Task Manager Memory Check

Task Manager Memory Check

GWMI Memory Check

GWMI Memory Check

As you can see, they’re quite close. Memory may fluctuate a bit from moment to moment and the precision of Task Manager isn’t the same as the precision of WMI so you shouldn’t expect a perfect match-up, but they’re close enough.

Unfortunately, this isn’t the entire story. The host has used up what it needs, the other guests have used up what they need, so you should be able to start up a guest with something around 2.6 GB of Startup RAM assigned, correct? The answer: maybe. There have been more than a few times when people have been unable to start a guest that was definitely below the Available line. So, what gives?

The answer to that is also the second part of the accuracy question around the WMI call. In the Task Manager screenshot, do you see the small, thin line in the Memory Composition graph? Hold your mouse over it, and you get this:

Standby Memory

Standby Memory

The segment after the used (shaded area) and before the completely unused (clear area at the far right) represents Standby memory. Just as the tooltip says, it “contains cached data and code that is not actively in use”. Basically, this is stuff that the system could page out to disk but, since there’s currently no pressing need to release it, is holding in RAM. Technically, this memory is available. When I was testing for this article, I was able to start a virtual machine with 2GB of fixed RAM without trouble. However, I’ve fielded questions from people in similar situations that could not start VMs that were within the “Available” range. My only guess is that the host couldn’t page enough of it for some reason. Without personally being there to investigate and never having that happen to me, I can’t definitively say what was going on. But, that’s not the point of this article.

What would really be nice to know is how much RAM is unquestionably available. I didn’t screen shot it, but in Task Manager, if you continue sliding the mouse to the right into the last segment of the Memory composition graph, it will show in the tool tip how much memory is completely open. But what about PowerShell? There’s probably some WMI way to do it, but I don’t know what that is — one of the worst things about WMI is discoverability. Most things in WMI are either painfully obvious or painfully obscure without a lot of middle ground. You could go digging around in WMI Explorer to see if there’s a field to query. Fortunately, there’s a really easy way just for us Hyper-V users:

That’s all you need. Sort of. One my test systems, this is what I get:

Simple Get-VMHostNumaNode

Simple Get-VMHostNumaNode

The MemoryAvailable field is quite accurate and shows an almost identical number to what is displayed as Free in Task Manager. As I went back and forth, they seem to stay within a couple dozen megabytes of each other, which is likely accounted for by the brief overhead of running the PowerShell cmdlet. I know that if I want to start a virtual machine that has 1340 megabytes or less of Startup memory that it will work.

But (there’s always a “but”), there is a problem. In the displayed system, this is a useful readout because I only have a single NUMA node in my test lab systems. If you’ve got a dual or quad socket host, then you have multiple NUMA nodes. Each node is going to get its own separate statistics set. If you’re not using NUMA spanning, then that’s OK; the largest MemoryAvailable reading represents the largest VM you’ll be guaranteed to be able to start. But, most of us are using NUMA spanning (there are precious few good reasons to turn it off). Unfortunately, there’s no quick and simple way to funnel all of that into a single reading. Sure, I could spend some time and craft a clever one-liner that would do the trick, but such things are very difficult to understand, almost impossible to remember, and day-to-day operations are not good places to use those clever one-liners. So, I worked up a script that is relatively simple but still shows you what you need to know:

This is designed to be dot-sourced, so save it as a .PS1 (Get-VMHostAvailableMemory.ps1 would be good) and then dot-source it from a PowerShell prompt:

You can also add it to your profile using steps you can find most anywhere, including earlier in our series. Another option would be to add a single line at the very end of the script with only “Get-VMHostAvailableMemory”; you can then run the script directly, but you can’t use Get-Help on it and you can’t feed it ComputerName qualifiers (well, you can, but they won’t do anything useful).

PowerShell Lessons from this Script

I tried to keep the complexity to a minimum but there are some new tricks in this script that I haven’t shown you before.

PowerShell Custom Objects

First is the usage of New-Object and Add-Member. I don’t want to use Write-Host or anything of the sort because it is inappropriate here, but I do need to ensure that I’m not just dumping raw numbers on you without context. The New-Object line creates a blank object, conveniently called “FreeMemoryBounds”. If you supplied more than one computer name, the first Add-Member line appends a property called PSComputerName and populates it with the name that you submitted; this is to simulate the action of a normal implicit remoting command since this script masks the underlying remoting capabilities of the cmdlets that it relies on. The next two lines add properties named “LowerBoundMB” and “UpperBoundMB”.

After all the processing is done, there is a single line that just contains the object all by itself (line 44). What that does is place the object into the pipeline. Once the script is done, you’re given a full PowerShell object named “FreeMemoryBounds” and it will have these properties attached to it. You can pass that object to other routines and access its properties using a dot just like you can any other PowerShell object. If you just run the script, then it outputs these fields to the screen in an easy-to-follow format. Also, because of the “foreach” loop, you’ll be given one object per valid ComputerName input.

PowerShell Error Handling with Try-Catch

The second thing is the usage of try-catch. I had two major reasons to use it here. The first is that, without it, it would be trivial to create a FreeMemoryBounds object that had invalid data. For just one host queried in an interactive session, that wouldn’t be a problem. If you were performing an automated run against 30 hosts, that would be a much bigger problem. The second reason is that, without the try block, an invalid computer name (or blocked firewall, or insufficient permissions, etc.) would result in both of the information retrieval cmdlets being run, even though an error condition on the first one is enough to let you know that the second isn’t worth the effort. Again, not a big deal for a single iteration, but problematic when there are many.

I chose to take a very simple route. I took the two cmdlets that are most likely to fail and put them into a single try block. Also, because I only want the new FreeMemoryBounds object to be placed in the pipeline if it has valid data, I included it in the same try block. That way, if either of the two cmdlets before it fails, that line is skipped and the object is silently destroyed. If an error does occur, I just emulate the same mechanism that PowerShell uses to place the error into the error stream (Write-Error). I do this because I don’t want to suppress the error but I also don’t care about the error itself. What I care about is not allowing an error to affect the product of my script.

There is one very, very important lesson to be learned. I even see some of the top-tier PowerShell experts make mistakes here. For every cmdlet exception that you wish to be caught in a try block, you must include the ErrorAction parameter and set it to “Stop”. PowerShell’s default error action is to display the error and keep going (“Continue”). Some people override that on their systems, but most people won’t. If the default action isn’t Stop and you don’t set ErrorAction to Stop, then the try block leads a pointless existence that just makes your script a little harder to read.

While this is important to remember, especially because it’s not intuitive, it also highlights a special feature(?) of PowerShell that you won’t find in traditional computer programming languages: if you have a cmdlet inside a try block that you don’t want to trigger the error handling system, then just set its ErrorAction to something other than “Stop”. Ordinarily, you would just keep cmdlets of that kind outside a try block in the first place, but this capability grants you quite a bit of flexibility to selectively ignore and capture error conditions as necessary.


Hyper-V Page File Settings

Hyper-V Page File Settings

When you install Hyper-V or a copy of Windows Server for the express purpose of running the Hyper-V role, its default configuration for the page file (also called a swap file) is generally wasteful, although not harmful. Page files for individual virtual machines are tuned in the same fashion as normal physical machines, but there are a couple of things to think about that are unique to VMs. (more…)