How to Compact a VHDX with a Linux Filesystem

How to Compact a VHDX with a Linux Filesystem

 

Microsoft’s compact tool for VHD/X works by deleting empty blocks. “Empty” doesn’t always mean what you might think, though. When you delete a file, almost every file system simply removes its entry from the allocation table. That means that those blocks still contain data; the system simply removes all indexing and ownership. So, those blocks are not empty. They are unused. When a VHDX contains file systems that the VHDX driver recognizes, it can work intelligently with the contained allocation table to remove unused blocks, even if they still contain data. When a VHDX contains file systems commonly found on Linux (such as the various iterations of ext), the system needs some help.

Making Some Space

Before we start, a warning: don’t even bother with this unless you can reclaim a lot of space. There is no value in compacting a VHDX just because it exists. In my case, I had something go awry in my system that caused the initramfs system to write gigabytes of data to its temporary folder. My VHDX that ordinarily used around 5 GB ballooned to 50GB in a short period of time.

Begin by getting your bearings. df can show you how much space is in use. I neglected to get a screen shot prior to writing this article, but this is what I have now:

cpctlvhd_baseline

At this time, I’m sitting at a healthy 5% usage. When I began, I had 80% usage.

Clean up as much as you can. Use apt autoremove, apt autoclean, and apt clean on systems that use apt. Use yum clean all on yum systems. Check your /var/tmp folder. If you’re not sure what’s consuming all of your data, du can help. To keep it manageable, target specific folders. You can save the results to a file like this:

You can then open the /home/<your account>/var-temp-du file using WinSCP. It’s a tab-delimited file, so you can manipulate it easily. Paste into Excel, and you can sort by size.

More user-friendly downloadable tools exist. I tried gt5 with some luck.

As I mentioned before, I had gigabytes of files in /var/tmp created by initramfs. I’m not sure what it used to create the names, but they all started with “initramfs”. So, I removed them that way: rm /var/tmp/initramfs* -r. That alone brought me down to the lovely number that you see above. However, as you’re well aware, the VHDX remains at its expanded size.

Don’t forget to df after cleanup! If the usage hasn’t changed much, then I’d stop here and either find something else to delete or find something else to do altogether.

Zeroing a VHDX with an ext Filesystem

I assume that this process will work with any file system at all, but I’ve only tested with ext4. Your mileage may vary.

Because the VHDX cannot parse the file system, it can only remove blocks that contain all zeros. With that knowledge, we now have a goal: zero out unused blocks. We’ll need to do that from within the guest.

Preferred Method: fstrim

My personal favorite method for handling this is the “fstrim” utility. Reasons:

  • fstrim works very quickly
  • fstrim doesn’t cause unnecessary wear on SSDs but still works on spinning rust
  • fstrim ships in the default tool set of most distributions
  • fstrim is ridiculously simple to use

Usage:

On my system that had recently shed over 70 GB of fat, fstrim completed in about 5 seconds.

Note: according to some notes that I found for Ubuntu, it automatically performs an fstrim periodically. I assume that you’re here because you want this done now, so this information probably serves mostly as FYI.

Alternative Zeroing Methods

If fstrim doesn’t work for you, then we need to look at tools designed to write zeros to unused blocks.

I would caution you away from using security tools.  They commonly make multiple passes of non-zero writes for security purposes on magnetic media. That’s because an analog reader can detect charge levels that are too low to register as a “1” on your drive’s internal digital head. They can interpret them as earlier write operations. After three forced writes to the same location, even analog equipment won’t read anything. On an SSD, though, those writes will mostly reduce its lifespan. Also, non-zero writes are utterly pointless for what we’re doing. Some security tools will write all zeros. That’s better, but they also make multiple passes. We only need one.

Create a File from /dev/zero

Linux includes a nifty built-in tool that just generates zeroes until you stop asking. You can leverage it by “reading” from it and outputting to a file that you create just for this purpose.

On a physical system, this operation would always take a very long time because it literally writes zeros to every unused block in the file system. Hyper-V will realize that the bits being written are zeroes. So, when it hits a block that hasn’t already been expanded, it will just ignore the write. However, the blocks that do contain data will be zeroed, so this can still take some time. So, it’s not nearly as fast as fstrim, but it’s also not going to make the VHDX grow any larger than it already is.

zerofree

The “zerofree” package can be installed with your package manager from the default repository (on most distributions). It has major issues that might be show-stoppers:

  • I couldn’t find any way to make it work with LVM volumes. I found some people that did, but their directions didn’t work for me. That might be because of my disk system, because…
  • It’s not recommend for ext4 or xfs file systems. If your Linux system began life as a recent version, you’re probably using ext4 or xfs.
  • Zerofree can’t work with mounted file systems. That means that it can’t work with your active primary file system.
  • You’ll need to detach it and attach it to another Linux guest. You could also use something like a bootable recovery disk that has zerofree.

If you mount it in a foreign system, run sudo lsblk -f to locate the attached disk and file systems:

Verify that the target volume/file system does not appear in df. If it shows up in that list, you’ll need to unmount it before you can work with it.

I’ve highlighted the only volume on my added disk that is safe to work with. It’s a tiny system volume in my case so zeroing it probably won’t do a single thing for me. I’m showing you this in the event that you have an ext2 or ext3 file system in one of your own Linux guests with a meaningful amount of space to free. Once you’ve located the correct partition whose free space you wish to clear:

Search!

In my research for this article, I found a number of search hits that looked somewhat promising. If nothing here works for you, look for other ways. Remember that your goal is to zero out the unused space in your Linux file system.

Compact the VHDX

The compact process itself does not differ, regardless of the contained file system. If you already know how to compact a dynamically-expanding VHDX, you’ll learn nothing else from me here.

As with the file delete process, I always recommend that you look at the VHDX in Explorer or the directory listing of a command/PowerShell prompt so that you have a “before” idea of the file.

Use PowerShell to Compact a Dynamically-Expanding VHDX

The owning virtual machine must be Off or Saved. Do not compact a VHDX that is a parent of a differencing disk. It might work, but really, it’s not worth taking any risks.

Use the Optimize-VHD cmdlet to compact a VHDX:

The help for that cmdlet indicates that -Mode Fullscans for zero blocks and reclaims unused blocks”. However, it then goes on to say that the VHDX must be mounted in read-only mode for that to work. The wording is unclear and can lead to confusion. The zero block scan should always work. The unused block part requires the host to be able to read the contained file system — that’s why it needs to be mounted. The contained file system must also be NTFS for that to work at all. All of that only applies to blocks that are unused but not zeroed. The above exercise zeroed those unused blocks. So, this will work for Linux file systems without mounting.

Use Hyper-V Manager to Compact a Dynamically-Expanding VHDX

Hyper-V Manager connects you to a VHDX tool to provide “editing” capabilities. The options for “editing” includes compacting. It can work for VHDX’s that are attached to a VM or are sitting idle.

Start the Edit Wizard on a VM-Attached VHDX

The virtual machine must be Off or Saved. If the virtual machine has checkpoints, you will be compacting the active VHDX.

Open the property sheet for the virtual machine. On the left, highlight the disk to compact. On the right, click the Edit button.

cpctlvhd_vmdiskselect

Jump past the next sub-section to continue.

Start the Edit Wizard on a Detached VHDX

The VHDX compact tool that Hyper-V Manager uses relies on a Hyper-V host. If you’re using Hyper-V Manager from a remote system, that means something special to you. You must first select the Hyper-V host that will be performing the compact, then select the VHDX that you want that host to compact.

Select the host first:

cpctlvhd_hostselectNow, you can either right-click on that host and click Edit Disk or you can use the Edit Disk link in the far right Actions pane; they both go to the same wizard.

cpctlvhd_editdisk

The first screen of the wizard is informational. Click Next on that. After that, you’ll be at the first actionable page. Read on in the next sub-section.

Using the Edit Disk Wizard to Compact a VHDX

Both of the above processes will leave you on the Locate Disk page. The difference is that if you started from a virtual machine’s property sheet, the disk selector will be grayed out. For a standalone disk, enter or browse to the target VHDX. Remember that the dialog and tool operate from the perspective of the host. If you connected Hyper-V Manager to a remote host, there may be delegation issues on SMB-hosted systems.

cpctlvhd_locatedisk

On the next screen, choose Compact:

cpctlvhd_compactoption

The final page allows you to review and cancel if desired. Click Finish to start the process:

cpctlvhd_wizfinish

Depending on how much work it has to do, this could be a quick or slow process. Once it’s completed, it will simply return to the last thing you were doing. If you started from a virtual machine, you’ll return to its property sheet. Otherwise, you’ll simply return to Hyper-V Manager.

Check the Outcome

Locate your VHDX in Explorer or a directory listing to ensure that it shrank. My disk has returned to its happy 5GB size:

cpctlvhd_results

 

4 Ways to Transfer Files to a Linux Hyper-V Guest

4 Ways to Transfer Files to a Linux Hyper-V Guest

You’ve got a straightforward problem. You have a file on your Windows machine. You need to get that file into your Linux machine. Your Windows machine runs Hyper-V, and Hyper-V runs your Linux machine as a guest. You have many options.

Method 1) Use PowerShell and Integration Services

This article highlights the PowerShell technique as it’s the newest method, and therefore the least familiar. You’ll want to use this method when the Windows system that you’re working from hosts the target Linux machine. I’ll provide a longer list of the benefits of this method after the how-to.

Prerequisite for Copying a File into a Linux Guest: Linux Integration Services

The PowerShell method that I’m going to show you makes use of the Linux Integration Services (LIS). It doesn’t work on all distributions/versions. Check for your distribution on TechNet. Specifically, look for “File copy from host to guest”.

By default, Hyper-V disables the particular service that allows you to transfer files directly into a guest.

Enabling File Copy Guest Service in PowerShell

The cmdlet to use is Enable-VMIntegrationService. You can just type it out:

The Name parameter doesn’t work with tab completion, however, so you need to know exactly what to type in order to use that syntax.

You can use Get-VMIntegrationService for spelling assistance:

Enable-VMIntegrationService includes a VMIntegrationService parameter that accepts an object, which can be stored in a variable or piped from Get-VMIntegrationService:

You could leave out the entire where portion and pipe directly in order to enable all services for the virtual machine in one shot.

Use whatever method suits you best. You do not need to power cycle the virtual machine or make any other changes.

Enabling File Copy Guest Service in Hyper-V Manager or Failover Cluster Manager

If you’d prefer to use a GUI, either Hyper-V Manager or Failover Cluster Manager can help. To file copy for a guest in Hyper-V Manager or Failover Cluster Manager, open the Settings dialog for the virtual machine. It does not matter which tool you use. The virtual machine can be On or Off, but it cannot be Saved or Paused.

In the dialog, switch to the Integration Services tab. Check the box for Guest services and click OK.

lincopy_servicescheck

You do not need to power cycle the virtual machine or make any other changes.

Verifying the Linux Guest’s File Copy Service

You can quickly check that the service in the guest is prepared to accept a file from the host:

Look in the output for hypervfcopyd:

lincopy_serviceverify

Of course, you can supply more of the name to grep than just “hyper” to narrow it down, but this is easier to remember.

Using Copy-VMFile to Transfer a File into a Linux Guest

All right, now the prerequisites are out of the way. Use Copy-VMFile:

You can run Copy-VMFile remotely:

Notice that SourcePath must be from the perspective of ComputerName. Tab completion won’t work remotely, so you’ll need to know the precise path of the source file. It might be easier to use Enter-PSSession first so that tab completion will work.

You can create a directory on the Linux machine when you copy the file:

CreateFullPath can only create one folder. If you ask it to create a directory tree (ex: -CreateFullPath '/downloads/new' ), you’ll get an error that includes the text “failed to initiate copying files to the guest: Unspecified error (0x80004005)“.

Benefits and Notes on Using Copy-VMFile for Linux Guests

Some reasons to choose Copy-VMFile over alternatives:

  • I showed you how to use it with the VMName parameter, but Copy-VMFile also accepts VM objects. If you’ve saved the output into a variable from Get-VM or some other cmdlet that produces VM objects, you can use that variable with Copy-VMFile’s VM parameter instead of VMName.
  • The VMName and VM parameters accept arrays, so you can copy a file into multiple virtual machines simultaneously.
  • You do not need a functioning network connection within the Linux guest or between the host and the guest.
  • You do not need to open firewalls or configure any daemons inside the Linux guest.
  • The transfer occurs over the VMBus, so only your hardware capabilities can limit its speed.
  • The transfer operates under the root account, so you can place a file just about anywhere on the target system.

Notes:

  • As mentioned in the list item in the preceding list, this process runs as root. Be careful what you copy and where you place it.
  • Copied files are marked as executable for some reason.
    lincopy_executable
  • Copy-VMFile only works from host to guest. The existence of the FileSource parameter implies that you copy files the other direction, but that parameter accepts no value other than Host.

Method 2) Using WinSCP

I normally choose WinSCP for moving files to/from any Linux machine, Hyper-V guest or otherwise.

If you choose the SCP protocol when connecting to a Linux system, it will work immediately. You won’t need to install any packages first:

lincopy_scp

Once connected, you have a simple folder display for your local and target machines with simple drag and drop transfer functionality:

lincopy_winscpmain

You can easily modify the permissions and execute bit on a file (as long as you have permission):

lincopy_fileprops

You can use the built-in editor on a file or attach it to external editors. It will automatically save the output from those editors back to the Linux machine:

lincopy_edit

You can even launch a PuTTY session right from WinSCP (if PuTTY is installed):

lincopy_putty

I still haven’t found all of the features of WinSCP.

Method 3) Move Files to/from Linux with the Windows FTP Client

Windows includes a command-line ftp client. It has many features, but still only qualifies as barely more than rudimentary. You can invoke it with something like:

The above will attempt to connect to the named host and will then start an interactive session. If you’d like to start work from within an interactive session, that would look something like this:

Use ftp /?  at the command prompt for command-line assistance and help at the interactive ftp > prompt for interactive assistance.

You’ll have a few problems using this or any other standard FTP client: most Linux distributions do not ship with any FTP daemon running. Most distributions allow you to easily acquire vsftpd. I don’t normally do that because SCP is already enabled and it’s secure.

Method 4) Move Files Between Linux Guests with a Transfer VHDX

If you have a distribution that doesn’t work with Copy-VMFile, or you just don’t want to use it, you can use a portable VHDX file instead.

  1. First, create a disk. Use PowerShell so that the sparse files don’t cause the VHDX file to grow larger than necessary:

  1. Attach the VHDX to the Linux guest. If you attach to the virtual SCSI chain, you don’t need to power down the VM.
  2. Inside the Linux guest, create an empty mount location.
  3. Determine which of the attached disks can be used for transfer with sudo fdisk -l . You are looking for a /dev/sd* item that has FAT32 partition information.
    Do not use:
    lincopy_diskinuse
    Use:
    lincopy_disknotinuse
  4. Enter the following as shown. Outputs will show you what you’re doing; I’m only telling you what to type:
  5. Run sudo fdisk -l to verify that your new disk now has a W95 FAT32 partition. You need FAT32 as it’s the only file system that both Linux and Windows can use without extra effort that’s not worth it for a transfer disk.
  6. Format your new partition:

You have successfully created your transfer disk.

Use a Transfer Disk in Linux

To use a transfer disk on the Linux side, you need to attach it to the Linux machine. Then you need to mount it:

  1. Use sudo fdisk -l to verify which device Linux has assigned the disk to. Use the preceding section for hints.
  2. Once you know which device it is, mount it to your transfer mount point: mount /dev/sdb1 /transfer.  Move/copy files into/out of the /transfer folder.
  3. Once you’re finished, unmount the disk from the folder:

    or
  4. Detach the VHDX from the virtual machine.

Use a Linux Transfer Disk in Windows

You mount a VHDX in Windows via Mount-VHD (must be running Hyper-V), Mount-DiskImage, or Disk Management. Once mounted, work with it as you normally would. Mount-VHD and Disk Management will attach it to a unique drive letter; Mount-DiskImage will mount to the empty path that you specify. Once you’re finished working with it, you can use Dismount-VHD, Dismount-DiskImage (don’t forget -Save!), or Disk Management.

Be aware that even though Windows should have no trouble reading a FAT32 partition/volume created in Linux, the opposite is not true! Do not use Windows formatting tools for a Linux transfer disk! Your mileage may vary, but formatting in Linux always works, so stick to that method.

 

Troubleshooting After a Hyper-V Cluster Node Replacement

Troubleshooting After a Hyper-V Cluster Node Replacement

It’s simple to add and remove nodes in a Microsoft Failover Cluster. That ease can hide a number of problems that can spell doom for your virtual machines, though. I’ve put together a quick guide/checklist for you to check if your last node addition/replacement didn’t go as smoothly as expected.

1. Did You Validate Your Cluster?

Cluster validation can be annoying and it can suck up time that you don’t feel that you have to spare, but it must be done.

troublenode_validation

Cluster validation runs far more tests in far less time than you could ever hope to do on your own. It’s not perfect by any means, and it can seem obnoxiously overbearing in some regards, but it can also point out issues before they break your cluster.

Cluster validation can be performed while the cluster is in production, with one exception: storage. None of the storage tests can be safely run against online virtual machines unless they’re on SMB storage. If you can move them to a suitable file share, do so. If you can’t, then schedule a time when you can save all of the virtual machines for a few minutes. If you can’t do that either, then run validation without the storage tests. It’s better to have a partial test than none at all.

2. Do You Need to Clear the Node?

Sometimes, you can’t even re-add a node. You get a message that the computer is already in a cluster, and the wizard blocks you from proceeding! From any node that’s still in the cluster, run this from an elevated PowerShell prompt:

I’ve had to take that step every time, regardless of what else I did in advance.

3. Did You Fix DNS?

Hyper-V cluster nodes typically use at least two IP addresses: Management and Live Migration. You might well be using at least one other for cluster communications. If you’re connected via iSCSI, there will be at least one more IP address there. Many of those IPs may reside on isolated IP networks that don’t have utilize a router. That makes those IPs unreachable from other IP networks. If those IPs are being registered in DNS, then it’s only a matter of time before they cause problems.

I typically design a custom script to assign IPs to a host and ensure that it only registers the management address in DNS. You can make those changes manually, if you prefer. Just remember to get it done.

4. Did You Match All Windows Roles and Features?

On a recent rebuild, I moved quickly due to a great deal of urgency. All seemed well, but then I tested my first Live Migration to the rebuilt node, and it crashed the VM! The error code was: 0x80070780 (The file cannot be accessed by the system). It didn’t say which file (because, you know, why would that be useful information?), so I began by verifying that all of the VM’s files were in the same highly available location.

I’ll spare you the details of my fairly frantic searching, but it turned out that I had neglected to update my deployment script and had missed one very critical component: this cluster hosted virtual desktops, so each node ran the Data Deduplication role… except the one that I had newly rebuilt. I quickly whipped out a role/feature deployment script that I keep on hand, and all was well.

5. Do You Need to Fix Permissions?

Always be on the lookout for the lovely 0x80070005 — otherwise known as “access denied”. When you rebuild a cluster node using the same name, it should slide right back into Active Directory without any fuss. Deleting the Active Directory object before re-adding the node doesn’t really help things, so I’d avoid that. Either way, you might need to rebuild permissions. I would pay special attention to delegation. I wouldn’t spend a great deal of time guessing at it. If you think delegation might be an issue, then apply the fix and test.

Usually, you do not need to re-apply file level permissions after a node add/rebuild. If you feel that it’s necessary, I would work at the containing folder level as much as possible. It can be maddening trying to set ACLs on individual virtual machine locations.

6. Are You Having an SPN Issue?

Look in the Event Viewer on other nodes for event ID 4 from Security-Kerberos regarding failures around Kerberos tickets and SPNs (service principal names). This can happen whether or not you deleted the Active Directory object beforehand, although it seems to sort itself out more easily when you re-use the existing object.

If you continue having trouble with this message, you’ll find many references and fix suggestions by searching on the event ID and text. Everywhere that I went, I saw different answers. No one seemed to have gathered a nice little list of things to try.

7. Did You Set Your PowerShell Execution and Remoting Policies?

I have a long list of “issues” that I solve by group policy. If you’re not doing that, then you could miss a number of small things. For instance, if you have built up a decent repertoire of PowerShell scripts to handle automation, you might suddenly find that they don’t work after a node replacement. This should help (if run from an elevated PowerShell prompt):

Of course, the battle over what constitutes the “best” PowerShell execution policy continues to rage on, and will likely do so for as long as people like to argue. “RemoteSigned” has served me well. Use what you like. Just remember that the default “AllSigned” will restrict you.

8. Do You Just Need To Rebuild the Cluster?

I have never once needed to completely destroy and rebuild a cluster outside of my test lab. I wouldn’t take such a “nuclear” option off of the table, however. Each cluster node maintains its own small database about the cluster configuration. If you’re using a quorum mode that includes disk storage, a copy of the database exists there as well. Like any other database, I’m forced to accept the possibility that the database could become damaged beyond repair. If that happens, a complete rebuild might be your best bet.

Try all other options first.

If you must destroy a cluster to rebuild it, remember this:

  • The contents of CSVs and cluster disks will not be altered, but you won’t be able to keep them online.
  • If a cluster’s virtual machines are kept on SMB shares, they can remain online during the rebuild through careful adding and removing. You can add and remove HA features to/from a virtual machine without affecting its running state.
  • You must run the Clear-ClusterNode command against each node.
  • Delete the HKLM\Cluster key before re-adding a node.
  • Format any witness disks before re-adding to a cluster.
  • Delete any cluster-related data from witness shares before re-adding to a cluster.

Fortunately, Microsoft made their clustering technology simple enough that such drastic measures should never be necessary.

 

Comparing Hyper-V Generation 1 and 2 Virtual Machines

Comparing Hyper-V Generation 1 and 2 Virtual Machines

 

The 2012 R2 release of Hyper-V introduced a new virtual machine type: Generation 2. The words in that designation don’t convey much meaning. What are the generations? What can the new generation do for you? Are there reasons not to use it?

We’ll start with an overview of the two virtual machine types separately.

Generation 1 Virtual Machines: The Legacy

The word “legacy” often invokes a connotation of “old”. In the case of the Generation 1 virtual machine, “old” paints an accurate picture. The virtual machine type isn’t that old, of course. However, the technology that it emulates has been with us for a very long time.

BIOS

BIOS stands for “Basic Input/Output System”, which doesn’t entirely describe what it is or what it does.

A computer’s BIOS serves two purposes:

  1. Facilitates the power-on process of a computer. The BIOS initializes all of the system’s devices, then locates and loads the operating system.
  2. Acts as an interface between the operating system and common hardware components. Even though there are multiple vendors supplying their own BIOSes, all of them provide the same command set for operating systems to access. Real mode operating systems used those common BIOS calls to interact with keyboards, disk systems, and text output devices. Protected mode operating systems do not use BIOS for this purpose. Instead, they rely on drivers.

Hyper-V creates a digital BIOS that it attaches to all Generation 1 virtual machines.

Emulated Hardware

A virtualization machine is a fake computer. Faking a computer is hard. A minimally functional computer requires several components. Have you ever looked at a motherboard and wondered what all of those chips do? I’m sure you recognize the CPU and the memory chips, but what about the others? Each has its own important purpose. Each contributes something. A virtual machine must fake them all.

One of the ways a virtual machine can fake hardware is emulation. Nearly every computer component is a digital logical device. That means that each of them processes data in binary using known, predictable methods. Since we know what those components do and since they accept and produce binary data, we can make completely digital copies of them. When we do that, we say that we have emulated that hardware. Emulated hardware is a software construct that produces behavior that is identical to the “real” item. If you look at Device Manager inside a Generation 1 virtual machine, you can see evidence of emulation:

gens_legacyhardware

Digitally, the IDE controller in a Hyper-V virtual machine behaves exactly like the Intel 82371AB/EB series hardware. Because almost all operating systems include drivers that can talk to Intel 82371AB/EB series hardware, they can immediately work inside a Hyper-V Generation 1 VM.

Emulated hardware provides the benefit of widespread compatibility. Very few operating systems exist that can’t immediately work with these devices. They also tend to work in the minimalist confines of PXE (pre-boot execution environment). For this reason, you’ll often see requirements to use a Generation 1 virtual machine with a legacy network adapter. The PXE system can identify and utilize that adapter; it cannot recognize the newer synthetic adapter.

Generation 2 Virtual Machines: A Step Forward

BIOS works very well, but has a number of limitations. On that list, the most severe is security; BIOS knows to load boot code from devices and that’s it. It cannot make any judgment on whether or not the boot code that it found should be avoided. When it looks for an operating system on a hard disk, that hard disk must use a master boot record (MBR) partition layout, or BIOS won’t understand what to do. MBR imposes a limit of four partitions and 2TB of space.

UEFI

Enter the Unified Extensible Firmware Interface (UEFI). As a successor to BIOS, it can do everything that BIOS can do. On some hardware systems, it can emulate a BIOS when necessary. There are three primary benefits to choosing UEFI over BIOS:

  1. Secure Boot. UEFI can securely store an internal database of signatures for known good boot loaders. If a boot device presents a boot loader that the UEFI system doesn’t recognize, it will refuse to boot. Secure Boot can be an effective shield against root kits that hijack the boot loader.
  2. GPT disk layout. The GUID partition table system (GPT) has been available for some time, but only for data disks. BIOS can’t boot to it. UEFI can. GPT allows for 128 partitions and a total disk size of 8 zettabytes, dramatically surpassing MBR.
  3. Extensibility. Examine the options available in the firmware screens of a UEFI physical system. Compare them to any earlier BIOS-only system. UEFI allows for as many options as the manufacturer can fit onto their chips. Support for hardware that didn’t exist when those chips were soldered onto the mainboard might be the most important.

When you instruct Hyper-V to create a Generation 2 virtual machine, it uses a UEFI construct.

Synthetic Hardware

Synthetic hardware diverges from emulated hardware in its fundamental design goal. Emulated hardware pretends to be a known physical device to maximize compatibility with guest operating environments and systems. Hypervisor architects design synthetic hardware to maximize interface capabilities with the hypervisor. They release drivers for guest operating systems to address the compatibility concerns. The primary benefits of synthetic hardware:

  • Controlled code base. When emulating hardware, you’re permanently constrained by that hardware’s pre-existing interface. With synthetic hardware, you’re only limited to what you can build.
  • Tight hypervisor integration. Since the hypervisor architects control the hypervisor and the synthetic device, they can build them to directly interface with each other, bypassing the translation layers necessary to work with emulated hardware.
  • Performance. Synthetic hardware isn’t always faster than emulated hardware, but the potential is there. In Hyper-V, the SCSI controller is synthetic whereas the IDE controller is emulated, but performance differences can only be detected under extremely uncommon conditions. Conversely, the synthetic network adapter is substantially faster than the emulated legacy adapter.

Generation 2 virtual machines can use less emulated hardware because of their UEFI platform. They can boot from the SCSI controller because UEFI understands how to communicate with it; BIOS does not. They can boot using PXE with a synthetic network adapter because UEFI understands how to communicate with it; BIOS does not.

Reasons to Use Generation 1 Over Generation 2

There are several reasons to use Generation 1 instead of Generation 2:

  • Older guest operating systems. Windows/Windows Server operating systems prior to Vista/2008 did not understand UEFI at all. Windows Vista/7 and Windows Server 2008/2008 R2 do understand UEFI, but require a particular component that Hyper-V does not implement. Several Linux distributions have similar issues.
  • 32-bit guest operating systems. UEFI began life with 64-bit operating systems in mind. Physical UEFI systems will emulate a BIOS mode for 32-bit operating systems. In Hyper-V, Generation 1 is that emulated mode.
  • Software vendor requirements. A number of systems built against Hyper-V were designed with the limitations of Generation 1 in mind, especially if they target older OSes. Until their manufacturers update for newer OSes and Generation 2, you’ll need to stick with Generation 1.
  • VHD requirements. If there is any requirement at all to use VHD instead of VHDX, Generation 1 is required. Generation 2 VMs will not attach VHDs. If a VHDX doesn’t exceed VHD’s limitations, it can be converted. That’s certainly not convenient for daily operations.
  • Azure interoperability. At this time, Azure uses Generation 1 virtual machines. You can use Azure Recovery Services with a Generation 2 virtual machine, but it is down-converted to Generation 1 when you fail to Azure and then up-converted back to Generation 2 when you fail back. If I had any say in designing Azure-backed VMs, I’d just use Generation 1 to make things easier.
  • Virtual floppy disk support. Generation 2 VMs do not provide a virtual floppy disk. If you need one, then you also need Generation 1.
  • Virtual COM ports. Generation 2 VMs do not provide virtual COM ports, either.

I’ll also add that a not-insignificant amount of anecdotal evidence exists that suggests stability problems with Generation 2 virtual machines. In my own experiences, I’ve had Linux virtual machines lose vital data in their boot loaders that I wasn’t able to repair. I’ve also had some networking glitches in Generation 2 VMs that I couldn’t explain that disappeared when I rebuilt the guests as Generation 1. From others, I’ve heard of VHDX performance variances and some of the same issues that I’ve seen. These reports are not substantiated, not readily reproducible, and not consistent. I’ve also had fewer problems using 2016 and newer Linux kernels with Generation 2 VMs.

Reasons to Use Generation 2 Over Generation 1

For newer builds that do not have any immediately show-stopping problems, I would default to using Generation 2. Some concrete reasons:

  • Greater security through Secure Boot. Secure Boot is the primary reason many opt for Generation 2. There are at least two issues with that, though:
    • Admins routinely make Secure Boot pointless. Every time someone says that they have a problem with Secure Boot, the very first suggestion is: “disable Secure Boot”. If that’s going to be your choice, then just leave Secure Boot unchecked. Secure Boot has exactly one job: preventing a virtual machine from booting from an unrecognized boot loader. If you’re going to stop it from doing that job, then it’s pointless to enable it in the first place.
    • Secure Boot might not work. Microsoft made a mistake. Maybe ensuring that your Hyper-V host stays current on patches will prevent this from negatively affecting you. Maybe it won’t.
  • Greater security through TPM, Device Guard, and Credential Guard. Hyper-V 2016 can project a virtual Trusted Platform Module (TPM) into Generation 2 virtual machines. If you can make use of that, Generation 2 is the way to go. I have not yet spent much time exploring Device Guard or Credential Guard, but here’s a starter article if you’re interested: https://blogs.technet.microsoft.com/ash/2016/03/02/windows-10-device-guard-and-credential-guard-demystified/.
  • Higher limits on vCPU and memory assignment. 2016 adds support for extremely high quantities of vCPU and RAM for Generation 2 VMs. Most of you won’t be building VMs that large, but for the rest of you…
  • PXE booting with a synthetic adapter. Generation 1 VMs require a legacy adapter for PXE booting. That adapter is quite slow. Many admins will deal with this by using both a legacy adapter and a synthetic adapter, or by removing the legacy adapter post-deployment. Generation 2 reduces this complexity by allowing PXE-booting on a synthetic adapter.
  • Slightly faster boot. I’m including this one mainly for completion. UEFI does start more quickly than BIOS, but you’d need to be rebooting a VM lots for it to make a real difference.

How to convert from Generation 1 to Generation 2

One of the many nice things about Hyper-V is just how versatile virtual hard disks are. You can just pop them off of one virtual controller and slap them onto another, no problems. You can disconnect them from one VM and attach them to another, no problems. Unless it’s the boot/system disk. Then, often, many problems. Generation 1 and 2 VMs differ in several ways, but boot/system disk differences are the biggest plague for trying to move between them.

I do not know of any successful efforts made to convert from Generation 2 to Generation 1. It’s possible on paper, but it would not be easy.

You do have options if you want to move from Generation 1 to Generation 2. The most well-known is John Howard’s PowerShell “Convert-VMGeneration” solution. This script has a lot of moving parts and does not work for everyone. Do not expect it to serve as a magic wand. Microsoft does not provide support for Convert-VMGeneration.

Microsoft has released an official tool called MBR2GPT. It cannot convert a virtual machine at all, but it can convert an MBR VHDX to GPT. It’s only supported on Windows 10, though, and it was not specifically intended to facilitate VM generation conversion. To use it for that purpose, I would detach the VHDX from the VM, copy it, mount it on a Windows 10 machine, and run MBR2GPT against the copy. Then, I would create an all-new Generation 2 VM and attach the converted disk to it. If it didn’t work, at least I’d still have the original copy.

Keep in mind that any such conversions are a major change for the guest operating system.Windows has never been fond of changes to its boot system. Conversion to GPT is more invasive than most other changes. Be pleasantly surprised at every successful conversion.

 

How to Monitor Hyper-V with Nagios: CentOS Edition

How to Monitor Hyper-V with Nagios: CentOS Edition

 

Are you monitoring your systems yet? I’m fairly certain that I make a big deal about that every few articles, don’t I? Last year, I wrote an article showing how to install and configure Nagios on Ubuntu Server. I’ve learned a few things in the interim. I also realize that not everyone wants to use Ubuntu Server. So, I set out on a mission to deploy Nagios on another popular distribution: CentOS. I’m here to share the procedure with you. Follow along to learn how to create your own monitoring system with no out-of-pocket cost. If you’re new to CentOS Linux on Hyper-V we suggest you check out the post.

Included in this Article

This article is very long because I believe in detailed instructions that help the reader understand why they’re typing something. It looks much worse than it is. I’ll cover:

  • A brief description of Nagios Core, the monitoring system
  • A brief description of NSClient++, the agent that operates on Windows/Hyper-V Servers to enable monitoring
  • Configuring a CentOS system to operate Nagios
  • Acquiring and installing the Linux packages
  • A discussion on the security of the NRPE plugin. Take time to skip down to this section and read it. The NSClient++/NRPE configuration that I will demonstrate in this article presents a real security concern. I believe that the risk is worthwhile and manageable, but you and/or your security team may disagree. Decide before you start this project, not halfway through.
  • Acquiring and installing the Windows packages
  • Configuring basic Nagios monitoring
  • Nagios usage

Not Included in this Article

While comprehensive, this article isn’t entirely all-encompassing. I am going to get you started on configuring monitoring, but you’re going to need to do some tinkering and investigation on your own. Building an optimal Nagios system requires practice more than rote instruction and memorization.

I have designed several monitoring scripts specifically for Hyper-V and failover clusters. These can be found in our subscriber’s area. Currently, the list includes:

  • Checking the free space of a Cluster Shared Volume
  • Checking the status of a Cluster Shared Volume (Redirected Access, etc.)
  • Checking the age of checkpoints
  • Checking the expansion percentage of dynamically-expanding VHD/Xs
  • Checking the health of a quorum witness

Since Nagios will alert you if any of these resources get into trouble, you can begin using those features without fear that they’ll break something while you’re not paying attention. Some of those monitors are shown in this grab of Nagios’ web interface:

Nagios Sample Data

Nagios Sample Data

What is Nagios Core?

Nagios Core is an open source software tool that can be used to monitor network-connected systems and devices. It processes data from sensors and separates the results into categories. By name, these categories are OK, Warning, and Critical. By default, Nagios Core sends a repeating e-mail when a sensor is in a persistent Warning or Critical state and a single “Recovery” e-mail when it has returned to the OK state.

Sensors collect data by “active” and/or “passive” checks. Nagios Core initiates active checks by periodically triggering plug-ins. Passive checks are when remote processes “call home” to the Nagios Core system to report status to a plug-in. The plug-in then delivers the sensor data to Nagios Core.

These plug-ins give Nagios Core its flexibility. Several plugins ship alongside Nagios Core. The Nagios community makes others available separately. Some are only included with the paid Nagios XI, which I will not cover. A plug-in is simply a Linux executable that collects information in accordance with its programming and returns data in a format that Nagios can parse.

Nagios Core provides multiple configurable options. One that we will be using is its web interface — a tiny snippet is shown in the screenshot above. This interface is not required, but grants you the ability to visually scan your environment from an overview level down to the individual sensor level. It also gives you other abilities, such as “Acknowledging” a Warning or Critical state and re-scheduling pending checks to make the next one occur very quickly(for testing) or much later (for repairs).

Isn’t Nagios Core Difficult?

Nagios Core has a reputation for being difficult to use, which I don’t think is appropriate. I believe that it got that reputation because you configure it with text files instead of in some flowery GUI. Nagios XI adds simpler configuration, but many will find that the cost jump from Core to XI makes editing text files more attractive.

Fortunately, the default installation include templates that not only show you exactly what you need to do, but also give you the ability to set things up via copy/paste and only a bit of typing. Personally, I found the learning curve to be very steep but also very short. Overall, I find Nagios Core much easier to use than the monitoring component of Microsoft’s full-blown Systems Center Operations Manager.

From here on out, I’m only going to use “Nagios” to mean “Nagios Core”.

What is NSClient++?

NSClient++ is a small service application that resides on Windows systems and interacts with a remote Nagios system. Since Nagios runs on Linux, it cannot perform a number of common Windows tasks. NSClient++ bridges the gap. Of its many features, we will be using it as a target for the “check_nt” and “check_nrpe” Nagios plug-ins. Upon receiving active check queries from these two plug-ins, it performs the requested checks and returns the data to those plug-ins.

Prerequisites for Installing Nagios and NSClient++

I’ve done my best to make this a one-stop experience. You’ll need to bring these things for an optimal experience:

  • One installation of CentOS. You can use a virtual machine, with these guidelines:
    • 2 vCPU, 512MB startup RAM, 256MB minimum RAM, 1GB maximum RAM, and a 40GB disk. Mine uses around 600MB of RAM in production, a negligible amount of CPU, and the VHDX has remained under 2GB.
    • Assign a static IP or use a DHCP reservation. You will be configuring NSClient++ to restrict queries to that IP.
    • If you only have one Hyper-V host, find some piece of hardware to use for Nagios. If you don’t have anything handy, check Goodwill or garage sales. You don’t want your monitoring system to be dependent on your only Hyper-V system.
    • I recommend against clustering the virtual machine that holds your Nagios installation. The less it depends upon, the better. In a 2-node Hyper-V cluster, I configure one Nagios system on internal storage on one node and a second Nagios system on the second node that does nothing but monitor the first.
    • Refer to my prior article if you need assistance installing CentOS. Includes instructions for running it in Hyper-V.
  • Download NSClient++ for the Windows/Hyper-V Servers to monitor. If you only have a few systems, the MSI will be the easiest to work with. If you have many, you might want to get the ZIP for Robocopy distribution.
    • Note: If using the ZIP, install the latest VC++ redistributable on target systems. Without the necessary DLLs, the NSClient service will not run and does not have the ability to throw any errors to explain why it won’t run.
  • NRPE (Nagios Remote Plugin Executor). This will run on the Nagios system.
  • WinSCP (optional). You can get by without WinSCP, but it makes Nagios administration much easier. See my previously linked article on CentOS for a WinSCP primer.
  • PuTTY (optional). You could also get by without PuTTY, if you absolutely had to. I wouldn’t try it. The linked CentOS article includes a primer for PuTTY as well.
  • Download Nagios Core and the Nagios plugins from nagios.org to your management computer. More detailed instructions follow.

Software Versions in this Document

This article was written using the following software versions:

  • Nagios Core 4.3.1
  • Nagios Plugins 2.2.1
  • NSClient++ 0.5.0.62
  • NRPE 3.1.0. For our purposes, a 2.x version would be fine as well because v3.x needs to downgrade its packets to talk to NSClient++ anyway.

Downloading Nagios Core and Transferring it to the Target System

Start on Nagios.org. I’ll give step-by-step instructions that worked for me. Seeing as how this is the Internet, things might be different by the time you read these words. Your goal is to download Nagios Core and the Nagios Plugins.

  1. From the Nagios home page, hover over Downloads at the top right of the menu. Click Nagios Core.
  2. You’ll be taken to the editions page. Under the Core column, click Download.
  3. If you want to fill in your information, go ahead. Otherwise, there’s a Skip to download link.
  4. You should now be looking at a table with the latest release and the release immediately prior. At the far right of the table are the download links. For reference, the version that I downloaded said nagios-4.3.1.tar.gz. Click the link to begin the download. Don’t close this window.
  5. After, or while, the main package is downloading, you can download the plugins. You can hover over Downloads and click Nagios Plugins, or you can scroll down on the main package download screen to Step 2 where you’ll find a link that takes you to the same page.
  6. You should now be looking at a similar table that has a single entry with the latest version of Nagios tools. The link is at the far right of this table; the one that I acquired was nagios-plugins-2.2.1.tar.gz. Download the current version.
  7. If you didn’t already download NRPE, do so now.
  8. Connect to your target system in WinSCP (or whatever other tool that you like) and transfer the files to your user’s home folder. I tend to create a Downloads folder (keep in mind that Linux is case-sensitive), but it doesn’t really matter if you create a folder or what you call it as long as you can navigate the system well enough to find the files.
    nagoncent_nagfiles

Note: You could use the wget application to download directly to your CentOS system. I never download anything from the Internet directly to a server.

Prepare CentOS for Nagios

Nagios depends on a number of other packages in order for its installation and operation. These steps were tested on a standard deployment of CentOS, but should also work on a minimal build.

Download and install prerequisite packages:

Then, we’ll create a user for operating Nagios.

Upon entering the passwd command, you’ll be asked to provide a password. I don’t want to tell you what to do, but you should probably keep note of it.

Next, we’ll create a security group responsible for managing Nagios and populate it with that new “nagios” user group and the account that Apache runs under.

Install or Upgrade Nagios on CentOS

Now we’re ready to compile and install Nagios.

First, you need to extract the files.

Note:I am not using sudo for the extraction! If you run the extraction with sudo, then you will always need to use sudo to manipulate the extracted files.

Note: I am using the directory structure and versions from my WinSCP screenshot earlier. If you placed your files elsewhere or have newer versions, this is an example instead of something you can copy/paste.

Execute the following to build and install Nagios.

Note: If upgrading, STOP after sudo make install or your config files will be renamed and replaced with the new defaults!

Note: Do not copy/paste this entire block at once. Run each line individually and watch for errors in the output.

Only if new install. Set Nagios to start automatically when the system starts.

Only if upgrading. Instruct CentOS to refresh its daemons and restart the newly replaced Nagios executable.

Install or Upgrade Nagios Plugins on CentOS

The plugin installation process is similar to the Nagios installation process, but shorter and easier.

Unpack the files first. The same notes from the Nagios section are applicable.

Compile and install the plugins. The same notes from the Nagios section are applicable, especially the bit about taking this one line at a time.

You’ve just installed several plugins for Nagios, most of which I’m not going to show you how to use. If you’d like to take a look, navigate to /usr/local/nagios/libexec:

nagoncent_plugins

Most of them have built-in help that you can access with a -h or –help:

You can also search on the Internet for assistance and examples.


Security and the NRPE Plugin

The next step is to compile and install the NRPE plugin. Before that, we need to stop and have a serious chat about it.

You might read in some places that NRPE plugin is a security risk. That is correct. It allows one computer to tell another computer to run a script and return the results. Furthermore, we’re going to be sending arguments (essentially, parameters) to those scripts. Doing so opens the door to injection attacks. One method that has been used to combat the issue is NRPE traffic encryption. I am not going to be exploring how to encrypt NRPE communications at this time.

I have several reasons for this:

  •  The simplest reason is that it’s difficult to do and I’m not certain of how much value is in the effort.
  • Encryption is often mistaken for data security when it is, in fact, more about data privacy. For example, if you transmit your password in encrypted format and the packet is intercepted, the attacker still has your password. The fact that it’s encrypted might be enough to put the attacker off, but any encryption can be broken with sufficient time and effort. Therefore, your password is only private in the sense that no casual observer will be able to see it. To keep it secure, you should not transmit it at all. We don’t really have that option. Because we are not encrypting, what an attacker could see is the command string and the result string. You’ll have full knowledge of what those are, so you can decide how serious that is to you. Our best approach is to ensure that the Nagios<->host communications chain only occurs on secured networks, even if we later enable SSL.
  • The author of NSClient++ had the good sense to ensure that you can’t operate just any old free-form script via NRPE. Scripts must be specifically defined and can be tightly controlled. If the script itself is sufficiently well-designed, a script injection attack should be prohibitively difficult. That still leaves the door open to data snooping, so take care in what data your checks return.
  • The author of NSClient++ also coded in the ability to restrict NRPE activities to specific source IP addresses. IP spoofing is possible, of course.
  • Windows, Linux, and/or hardware firewalls can help enforce the source and destination IP communications. Spoofing is still a risk, of course.
  • I ran a Wireshark trace on a Nagios-to-NSClient++ communications channel. Nothing was transmitted in clear text. There were changes made in NRPE 3.x that lead me to believe that it might be performing some encryption. Then again, it might just be Base64 encoding. Either way, no casual observer will be able to snoop it.

What I didn’t address in the above points is that NSClient++ could effectively authenticate the Nagios computer by only accepting traffic that was encrypted with its private key. So, yes, NRPE is a security risk and it is a higher risk without SSL. I won’t try to convince you otherwise.

I believe that, for internal systems, the risk is very manageable. If you’re going to be connecting to remote client sites, I would put the entire Nagios communications chain inside an encrypted VPN tunnel anyway because even if you encrypt NRPE, the other traffic is clear-text. The only people that I think should worry much about this are those that will be connecting Nagios to Hyper-V hosts using unsecured networks. Personally, I’m uncertain how a case could be made to do that even with SSL configured.

I’m not saying that I’ll never look into encrypting NRPE. Just not now, not in this article.


Install or Upgrade the NRPE Plugin on CentOS

For our purposes, the NRPE plugin requires little effort to install.

If you’re not already in the folder that contains the NRPE gzip file, return there. Unpack the file just like you did with the others.

Switch to the extracted folder. Compile and install the plugin. Part of the configure process includes creating a new SSL key. It requires several minutes to complete.

Verify that the plugin was created:

If CentOS responded by showing you a file, then all is well.

Connecting Apache to Nagios on CentOS

At this point, Nagios works and can begin monitoring systems. You’ll need to do some extra work to get the web interface going.

Start by creating an administrative web user account. This account belongs to Apache, not CentOS. As created in this article, it will have full access to everything in the web interface.

You will be prompted to enter a password for the “nagiosadmin” account. Be aware that the -c parameter causes the file to be created anew. If it already exists, it will be overwritten.

You also need to add the CentOS account that Apache uses to the security group that can access Nagios’ files:

Next, set Apache to start when the system boots:

Allow port 80 through the firewall:

By default, the Security-Enhanced Linux module will prevent several of Nagios’ CGI modules from operating. If you want to quickly get around that, simply disable SELinux. Since we’re not hosting an online banking website or anything, I feel that this is an appropriate solution:

When nano opens the config file, change the line that reads SELINUX=enforcing to SELINUX=disabled. Press [CTRL]+[X] when finished, then [Y] and [Enter] to exit. I tinkered with some of the other options and mostly managed to break my system. If you’re concerned, then this article might help you.

Apache on CentOS ships with a default page that we need to disable:

Start Apache:

Your Nagios installation can now be viewed at http://yourserver/nagios. If you want a quick test from the local command line:

If you get a 301 or a lot of HTML with an embedded message that your browser doesn’t work, that’s a good sign.

Configuring Apache to Use Nagios as the Default Site

If you don’t want to hang /nagios off of URL requests to your system, follow these directions.

Open /etc/httpd/conf.d/nagios.conf in any text editor. It’s a fairly large file, so WinSCP or Notepad++ will make the chore simpler. From nano:

Add the following lines, either at the beginning or the end of the file. I’ve highlighted two lines where you’ll want to substitute your system details for mine:

Restart Apache for the settings to take effect:

Your Nagios installation will now appear at http://yourserver, without the need to add /nagios. If you changed those first two lines and you add matching records to your internal DNS server, the system will also respond at the specified URL.

Configure Nagios to Send E-mail on CentOS

By default, Nagios will use the /usr/bin/mail executable to send e-mail. You need to configure Postfix for that to work. There are many ways that can be done, and I have neither the time nor the systems to test them all. Fortunately, a document already exists that can help you with the most common configurations. You can also find several how-tos from the postfix documentation page. I will show you how to get started and I’ll demonstrate the two methods that I know. For anything else, you’ll need to research on your own.

Initial Postfix Configuration on CentOS

It’s easy to get started:

The basic e-mail infrastructure is now on your system.

Relaying Mail Through a Friendly Mail Server

If you’ve got a mail server that will allow anonymous e-mail via port 25 connections (like an Exchange server that allows local addresses), you have very little to do.

Open /etc/postfix/main.cf in a text editor. This is a large file and you’re going to be doing a lot of navigating, so choose your editor wisely. For nano:

Make these changes:

  • Uncomment the #myhostname line (by removing the #). Change it to: myhostname = myserver.mydomain.mytld (Substituting your server and domain information). This is the host name that it will present to the mail server.
  • Uncomment the #myorigin line. Change it to myorigin = mydomain.mytld (Substituting your domain information). E-mails sent by this server will append that domain to the user name.
  • Uncomment one of the #inet_interfaces lines or add a new one. Change it to: inet_interfaces = loopback-only. This sets this server to not receive any inbound e-mail.
  • After the #mydestination lines, add this: mydestination = . This will also prevent this server from accepting e-mail.
  • Uncomment one of the #relayhost lines or add a new one in this format: relayhost = myrealmailserver.mydomain.mytld. Substitute the real name or IP address of the host that will relay e-mail for this server.

Restart postfix:

Relaying Mail Through Your ISP

Some of us don’t have our own mail server. If you’re paying for a static IP and have registered a domain name, then you could configure your new Postfix installation as a true mail server. But, most of us aren’t that lucky either. Instead, we can configure Postfix to log in to our ISP’s SMTP account and send e-mail as us. Credit to ProfitBricks.

Install the necessary security binaries:

Open /etc/postfix/main.cf in a text editor. This is a large file and you’re going to be doing a lot of navigating, so choose your editor wisely. For nano:

Make these changes:

  • Uncomment the #myhostname line (by removing the #). Change it to: myhostname = myserver.mydomain.mytld (Substituting your server and domain information). This is the host name that it will present to the mail server.
  • Uncomment one of the #relayhost lines or add a new one in this format: relayhost = smtpserver.yourisp.tld. Substitute the real name or IP address of your ISP according to their instructions for SMTP connections. If your ISP requires a different port (ex: Gmail), use brackets around the host name, a colon, and the port: relayhost = [smtp.gmail.com]:587.
  • At the end of the file, add:

Create a file to hold the username and password to be used with your ISP’s mail server:

Inside that file, you’re going to enter your information in this format: SMTP-server-exactly-as-entered-in-main.cf username:password. Two examples:

Generate a lookup table for Postfix to retrieve the passwords:

Restart postfix:

Remove the clear-text file containing your password:

Nagios Is Installed!

You’ve completed all the functionality steps for the server! Walk through the web pages and check for any issues. I followed all of these same steps through and ended up with a fully functional system. If you’re having troubles, check that all of the prerequisite components installed successfully. If you have issues in one browser, try another.

You don’t have any sensors set up yet, so your displays will be very dull. We’ll rectify that in a bit. First, we need to talk about the Windows agent NSClient++.

Installing NSClient++

If you didn’t download NSClient++ before, do so now. NSClient++ has multiple deployment options. For your very first, I highly recommend one of the MSI installs. If you’ve got many systems, you might opt to grab a ZIP distribution as well. You can then mass-push pre-defined configurations via Robocopy, login scripts, or other means.

The installation screens:

  1. I didn’t show the initial screen. The first that you care about asks you to select the Monitoring Tool. Choose Generic.
    nagoncent_nsc1
  2. Next, choose your installation type. I ordinarily pick Custom so that I can deselect the op5 options, but any will work.
    nagoncent_nsc2
  3. Pay at least some attention on this screen. Everything here will be written to the NSClient++.ini file, so you can change it all later. These are the appropriate options to use with Nagios, but I’ll discuss each after the list.
    nagoncent_nsc3
  4. Finish the installation.

The configuration items that I instructed to choose:

  • Allowed hosts: This field is required. Any source IP not in the list will be rejected by NSClient++. You can use ranges (ex: 192.168.0/24).
  • Password: check_nt uses this password (-s switch). check_nrpe does not care. By default, Nagios has a single check_nt command item that you call from other sensors. If you wish to prevent password-sharing, you’ll need to duplicate the check_nt command for each separate password.
  • Common check plugins: These are built-in plugins that you can use with NSClient++. I don’t do much with them, but you might.
  • Enable nsclient server (check_nt): You will almost certainly use several check_nt sensors.
  • Enable NRPE server (check_nrpe): My Hyper-V test scripts depend upon NRPE.
  • Insecure legacy mode (required by old check_nrpe). Since we aren’t configuring certificates, this setting is required by the current check_nrpe as well.
  • Enable NSCA client: I’m not using this client, so I didn’t enable it.
  • Enable Web server: I just configure by text file, so I didn’t enable this, either.

Configuring NSClient++ to Work with PowerShell

You’ll need to modify NSClient++ to work with PowerShell. The installer doesn’t do that.

The ini file can be found at C:\Program Files\NSClient++\nsclient.ini. If you installed the 32-bit NSClient++ on 64-bit Windows, look in C:\Program Files (x86). The ini file is quite a mess. The following is my nsclient.ini file, with all of the fluff stripped away and the necessary lines added for PowerShell to function. I’ve highlighted what you must add:

The lines afterward show how I set up the commands and parameters for my customized scripts. The script bodies themselves are not included in this article (subscriber’s area, remember?).

Change your file to include the necessary lines and save the file. At an elevated command prompt, run:

It’s a normal Windows service with the name “nscp”, so you can also use ‘services.msc’, sc, or the PowerShell Stop-Service, Start-Service, and Restart-Service commands.

After the above, run netstat -aon | findstr LISTENING. Verify that there is a line item for 5666 (check_nrpe) and a line for 12489 (check_nt).

If this host is not configured to run unsigned PowerShell commands, run this at an elevated PowerShell prompt:

Much has been written about the execution policy and I have nothing to add. You can do an Internet search to make your own decisions, of course. None of my scripts are signed, so you’ll need RemoteSigned or looser in order to use them.

That’s It!

Your deployment is complete! Now it’s time to learn how to manage Nagios and configure sensors.

Controlling Nagios

During Nagios sensor configuration, you’ll find that you spend a great deal of time managing the Nagios service. Nagios control from the Linux command line is very simple. You’ll soon memorize these commands. Activate them in a PuTTY session or a local console.

ALWAYS Check the Nagios Configuration

After making any changes to configuration files, verify that they are valid before attempting to apply them to the running configuration:

If there are any problems, you’ll be told what they are and where to find them in the files. As long as you don’t stop Nagios, it will continue running with the configuration that it was started with. That gives you plenty of time to fix any errors.

Restart Nagios

Restart the Nagios service (only after verifying configuration!):

Stop and Start Nagios

If you need to take Nagios offline for a while and bring it up later (or if you forgot to checkconfig and have to recover from a broken setup), these are the commands:

Verify that Nagios is Running

Usually, the ability to access the web site is a good indication of whether or not Nagios is operational. If you want to check from within the Linux environment:

This will usually fill up the screen with information, so you’ll be given the ability to scroll up and down with the arrow keys to read all of the messages. Press [Q] when you’re finished.

An Introduction to Nagios Configuration Files

From here on out, I will be using WinSCP to manipulate the Nagios configuration files on the Linux host. Use PuTTY to issue the commands to check and restart the Nagios service after configuration file changes. You do not need to restart the Apache service.

Personally, I connect using the nagios account that we created in the beginning. WinSCP remembers the last folder that it was in per user, so it’s easier for navigation and so that you never run into any file permission problems. Just make a separate entry to the host for that account:

WinSCP Nagios Site

WinSCP Nagios Site

Work your way to /usr/local/nagios/etc. This is the root configuration folder. It mostly contains information that drives how it processes other files.

Nagios Root etc

Nagios Root etc

This location contains four files. I’m not going to dive into them in great detail, but I encourage you to open them up and give their contents a look-over to familiarize yourself.

  • cgi.cfg: As it says in the text, this is the primary configuration file. I have not changed anything in it.
  • htpasswd.users: This is the file that Apache will check when loading objects. Use the instructions at the top of this article to modify it.
  • nagios.cfg: This file contains a number of configuration elements for how Nagios interacts with the system. We are going to modify the OBJECT CONFIGURATION FILE(S) portion momentarily.
  • resource.cfg: This file holds customizable macros that you create, like the ones for e-mail.

Now, open up /usr/local/nagios/etc/objects. This is where the real work is done.

Nagios Configuration Folder in WinSCP

Nagios Configuration Folder in WinSCP

The file names are for your convenience only. Nagios reads them all the same way. So, don’t get agitated if you feel like a host template definition would be better in some file other than templates.cfg; Nagios doesn’t care as long as everything is formatted properly. This is what the files generally mean:

  • commands.cfg: This contains the commands that constitute the actual checks. For instance, check_ping is defined here.
  • contacts.cfg: When Nagios needs to tell somebody something, this is where those somebodies’ information is stored. It’s also where you connect users to time periods. For example, I have my administrative account in the business hours time period because I don’t really want to be woken up in the middle of the night because my test lab is unhappy.
  • localhost.cfg: Contains checks for the Linux system that runs Nagios.
  • printer.cfg: Define printer objects and checks here.
  • switch.cfg: Physical switches and their check definitions are in this file.
  • templates.cfg: Basic definitions that other definitions can inherit from are contained within.
  • timeperiods.cfg: You probably don’t want to be notified in the middle of the night when a switch misses a single ping, but you might want to know about it during normal work hours. Define what “normal work hours” and “leave me alone” time is in this file.
  • windows.cfg: Basic definitions for Windows hosts and checks.

Poke through these and get a feel for how Nagios is configured.

Nagios Objects and Their Uses

Nagios uses a few species of objects. Getting these right is important. Use the template file to guide you. The most pertinent objects are listed below:

  • contact: A target for notifications — usually an individual.
  • host: A host is any endpoint that can be checked. A computer, a switch, a printer, and a network-enabled refrigerator all qualify as a host.
  • command: Nagios checks things by running commands. The command files live in its plugins folder. The command definitions explain to Nagios how to call those plugins.
  • service: A “service” in Nagios is anything that Nagios can check with a command, and is a much more vague term than it is in Windows. In Nagios, services belong to hosts. So, if you want to know if a switch is alive by pinging it, the switch is a “host” and the ping is a “service” that calls a “command” called check_ping. You might call these “sensors” to compare to other products.
  • host group: Multiple hosts that are logically lumped together constitute a host group. Use them to apply one service to lots of hosts at once.
  • time period: This object is fairly well-explained by its name. They’re probably best understood by looking in the timeperiods.cfg file.

Nagios Templates

I’d say that the best place to start looking at Nagios objects is in the templates file. This is a copy/paste of the Contact template:

Start with the define line that indicates what type of object this block is describing. Most importantly, it signals to Nagios which properties should exist. Within this particular block, all properties for a contact are present with specific settings for each. If you use this template with a new object, then these will be its default settings. Next, notice the register line. By setting it to 0, you make it unavailable for Nagios to use directly, which is what makes this definition a template.. Now, look at an implementation of the above template:

It is also defined as a contact. First, notice the use line. Its name matches that of the template. That means that you don’t need to provide every setting for this contact, only the ones that you want to differ from the template. It is not necessary for an object to use a template. You can fill out all details for an object. A live object cannot use another live object, though, but one template can use another.

I often make backups of my configuration files before tinkering with them. WinSCP makes this simple with the Duplicate command. I also tend to copy my live configuration files to a safe place. Even though this whole thing seems easy to understand, you will make mistakes. Some of your mistakes are going to seem very stupid in retrospect. Always, always, always run sudo service nagios checkconfig before applying any new changes!

Nagios Hosts

A host in Nagios is an endpoint. It’s an easy definition in my case because I am going to specifically talk about Hyper-V hosts. The following is a host template definition that I created for my environment:

These hosts use a template that I created:

You’ll notice that this template uses the base windows-server template, but really makes no changes. I’m not overriding much in the windows-server template, so I could have all of my hosts use that one directly. However, creating a template to set up an inheritance hierarchy now is an inexpensive step that gives me flexibility later.

Nagios Groups

Most of the singular objects, like contacts and hosts, also have a corresponding group object. You might have noticed in my Hyper-V host template that it has a hostgroups property. Every host object that uses this template will be a member of the hyper-v-servers host group. Groups have very simple definitions:

I could also have used a members property within the host group definition or a hostgroups property within my Hyper-V host definitions to accomplish the same thing. This is less typing.

Host groups are very useful. First, they get their own organization on the Host Groups menu item in the Nagios web interface:

Nagios Host Groups Display

Nagios Host Groups Display

Second, you can define services at the host group level. That’s important, because otherwise, you’d have to define services for each and every host that you want to check, even if they’re all using the same check!

Nagios Services

Don’t let the term service confuse you with the same thing in a Windows environment. In Nagios, a service has a broader, although still perfectly correct definition. Anything that we can check is a service, whether that’s a ping response, Apache returning valid information on port 80, or even the output of a customized script like I have created for Hyper-V items.

The following is a service that I have created to monitor a Windows service — the Hyper-V Virtual Machine Manager service, to be exact:

Notice my use of hostgroup_name so that I only have to create this service one time. If I were creating a service for a specific host, I would use host_name instead.

I encourage you to look at the documentation for services. You may want to change the frequency of when checks occur. You may also want to redefine how long a service can be in a trouble state before you are notified.

Useful Nagios Objects Documentation

I’ve spent a little bit of time going over the objects within Nagios, but there is already a wealth of documentation on them. You will, no doubt, want to configure Nagios items on your own. NSClient++ also has a great deal more capability than what I’ve shown. These links helped me more than anything else:

Dealing with Problems Reported by Nagios

The web display is nice, and everyone enjoys seeing a screen-full of happy green monitor reports, but that’s not why we set up Nagios installations. Things break, and we want to know before users start calling. With the configuration that you have, you’ll have the ability to start getting notifications as soon as you set yourself up as a contact with valid information. When a problem occurs, Nagios will mark it as being in a SOFT warning or critical state, then it will wait to see if the problem persists for a total of three check periods (configurable). One the third check, it will mark the service as being in a HARD warning or critical state and send a notification.

If you fix a problem quickly, or if it resolves on its own, you’ll get a Recovery e-mail to let you know that all is well again. If the problem persists, you’ll continue getting an e-mail every few minutes (configurable). If one host has many services in a critical state, or if many separate hosts have issues, you’re going to be looking at a lot of e-mails.

The following screenshot shows what a service looks like in a critical state. You can see it on the Services menu item, the Services submenu under the Problems menu, and on the (Unhandled) link that is next to it.

Nagios Critical Service

Nagios Critical Service

If you click the link for the name of the service, in this case, Service: DNS, it will take you to the following details screen:

Nagios Service Detail

Nagios Service Detail

Take some time to familiarize yourself with this screen. I’m not going to discuss every option, but they are all useful. For now, I want you to look at Acknowledge this service problem.

Acknowledging Problems in Nagios

“Acknowledging” means that you are aware that the service is down. Once acknowledged, an acknowledgement notification will be sent out, but then no further notifications until the service is recovered. Basically, you’re telling Nagios, “Yes, yes, I know it’s down, leave me alone!” Click the Acknowledge this service problem link as shown in the previous screen shot and you’ll be taken to the following screen:

Nagios Acknowledgement

Nagios Acknowledgement

You can read the Command Description for an extended explanation of what Acknowledgement does and what your options are. I tend to fill out the comments field, but it’s up to you. Upon pressing Commit, the notification message is sent out and Nagios stops alerting until the service recovers (sometimes you get one more problem notification first).

Rescheduling a Nagios Service Check

Nagios runs checks on its own clock. You might have a service that doesn’t need frequent checks, so you might set it to only be tested every hour. During testing, you certainly won’t want to wait that long to see if your check is going to work. You might also want that Recovery message to go out right away after fixing a problem. In the service detail screen as shown a couple of screen shots up, click the Re-schedule the next check of this service link:

Reschedule Nagios Service Check

Reschedule Nagios Service Check

Of course, the time in the screen shot doesn’t mean anything to you. It’s the exact moment that I clicked the link on my system. If you then click Commit, it will immediately run the check. It might still take a few moments for the results to be returned so you won’t necessarily see any differences immediately, but the check does occur on time.

Scheduling Downtime

Smaller shops might not find it important to schedule downtime. If your Hyper-V host can reboot in less than 15 minutes, then you might not even get a downtime notification using the default settings. However, Nagios will give you the ability to start providing availability reports. Wouldn’t it be nice to show your boss and/or the company owners that your host was only ever down during scheduled maintenance windows?

From the service detail screen shot earlier, you can see the Schedule downtime for this service link. I’m assuming that you’ll be more likely to want to set downtime on a host rather than an individual service. The granularity is there for you to do either (or both) as suits your needs. A host’s detail screen (not shown) has Schedule downtime for this host and Schedule downtime for all services on this host links. You can also schedule downtime for an entire host or service group. These screens all look like this:

Nagios Downtime Scheduler

Nagios Downtime Scheduler

During scheduled downtime, notifications aren’t sent. In all reports, any outages during downtime are in the Scheduled category rather than Unscheduled.

The default Nagios Core distribution does not have a way to automatically schedule recurring downtime. There are some community-supported options.

Nagios Availability Reports

You saw a link in the service details screen shot above to View Availability Report for This Service. Hosts and services have this link. There’s also an Availability menu in the Reports section on the left that allows you to build custom reports. The following is a simple host availability report:

Nagios Availability

Nagios Availability

This is only for a single day. Notice the report options in the top right.

User Management for Nagios

So far, I’ve had you use the “nagiosadmin” account. As you spread out your deployment, you’re going to also set up new contacts. If you like, you can restrict those contacts to only see their own systems.

First, add a user to the nagcmd group. This will allow them to configure Nagios’ files. Be careful! If you don’t trust someone, skip this part and handle configuration for them. Optionally, you can skip the usermod parts (adds them to a group) and give them targeted access to specific configuration files.

Due to a difference between the Windows security model and the Windows security model, there is no secure way for Apache on Linux to be able to read the system users. So, you need to create a completely separate account for the Nagios web interface:

Now, create a Nagios contact with a matching contact_name:

The hosts, services, contact groups, etc. that the “andy” account is attached to will determine what Andy sees when he logs in to the Nagios web interface.

Using Nagios to Monitor Hyper-V – The real fun stuff starts here!

You now have all the tools you need to build your Hyper-V monitoring framework with Nagios. I’ve also written a few scripts and services that will get you up and running: Required Base Scripts, Monitoring the Oldest Checkpoint Age, Monitoring Dynamically Expanding VHDX Size, and more.

If you’d like to pick up these scripts and services, please register below to get access!

CentOS Linux on Hyper-V

CentOS Linux on Hyper-V

 

 

Microsoft continues turning greater attention to Linux. We can now run PowerShell on Linux, we can write .Net code for Linux, we can run MS SQL on Linux, Linux containers will run natively in Windows containers… the list just keeps growing. You’ve been able to find Linux-on-Hyper-V on that list for a while now, and the improvements have continued to roll in.

Microsoft provides direct support for Hyper-V running several Linux distributions as well as FreeBSD. If you have an organizational need for a particular distribution, then someone already made your choice for you. If you’re just getting started, then you need to make that decision yourself. I’m not a strong advocate for any particular distribution. I’ve written in the past about using Ubuntu Server as a guest. However, there are many other popular distributions available and I like to branch my knowledge.

Why Choose CentOS?

I’ve been using Red Hat’s products off and on for many years and have some degree of familiarity with them. At one time, there was simply “Red Hat Linux”. As a commercial venture attempting to remain profitable, Red Hat decided to create “Red Hat Enterprise Linux” (RHEL) which you must pay to use. With Red Hat being sensitive to the concept of free (as in what you normally think of when you hear “free”) being permanently attached to Linux in the collective conscience, they also make most of RHEL available to the CentOS Project.

One of the reasons that I chose Ubuntu was its ownership by a commercial entity. That guarantees that if you’re ever really stuck on something, there will be at least one professional entity that you can pay to assist you. CentOS doesn’t have that kind of direct backing. However, I also know (from experience) that relatively few administrators ever call support. Most that do work for bigger organizations that are paying for RHEL or the like. The rest will call some sort of service provider, like a local IT outsourcer. With that particular need mitigated, we are left with:

  • CentOS is based on RHEL. This is not something that someone is assembling in their garage (not that I personally think that’s a problem, but your executives may disagree)
  • CentOS has wide community support and familiarity. You can easily find help on the Internet. You will also not struggle to find support organizations that you can pay for help.
  • CentOS has a great deal in common with other Linux distributions. Because Linux is open source software, it’s theoretically possible for a distribution to completely change everything about it. In practice, no one does. That means that the bulk of knowledge you have about any other Linux distribution is applicable to CentOS.

That hits the major points that will assure most executives that you’re making a wise decision. In the scope of Hyper-V, Microsoft’s support list specifically names CentOS. It’s even first on the list, if that matters for anything.

Stable, Yet Potentially Demanding

When you use Linux’s built-in tools to download and install software, you are working from approved repositories. Essentially, it means that someone decided that a particular package adequately measured up to a standard. Otherwise, you’d need to go elsewhere to acquire that package.

The default CentOS repositories are not large when compared to some other distributions, and do not contain recent versions of many common packages, including the Linux kernel. However, the versions offered are known to be solid and stable. If you want to use more recent versions, then you’ll need to be(come) comfortable manually adding repositories and/or acquiring, compiling, and installing software.

No GUIs Here

CentOS does make at least one GUI available, but I won’t be covering it. I don’t know if CentOS’s GUI requires 3D acceleration the way that Ubuntu’s does. If they do, then the GUI experience under Hyper-V would be miserable. However, I didn’t even attempt to use any CentOS GUIs because they’re really not valuable for anything other than your primary use desktop. If you’re new to Linux and the idea of going GUI-free bothers you, then take heart: Linux is a lot easier than you think it is. I don’t think that any of the Linux GUIs score highly enough in the usability department to meaningfully soften the blow of transition anyway.

If you’ve already read my Ubuntu article, then you’ve already more or less seen this bit. Linux is easy because pretty much everything is a file. There are only executables, data, and configuration files. Executables can be binaries or text-based script files. So, any time you need to do anything, your first goal is to figure out what executable to call. Configuration files are almost always text-based, so you only need to learn what to set in the configuration file. The Internet can always help out with that. So, really, the hardest part about using Linux is figuring out which executable(s) you need to solve whatever problem you’re facing. The Internet can help out with that as well. You’re currently reading some of that help.

Enough talk. Let’s get going with CentOS.

Downloading CentOS

You can download CentOS for free from www.centos.org. As the site was arranged on the day that I wrote this article, there was a “Get CentOS” link in the main menu at the top of the screen and a large orange button stamped “Get CentOS Now”. From there, you are presented with a few packaging options. I chose “DVD ISO” and that’s the base used in this article. I would say that if you have a Torrent application installed, choose that option. It took me quite a bit of hunting to find a fast mirror.

For reference, I downloaded CentOS-7-x86_x64-DVD-1611.iso.

How to Build a Hyper-V Virtual Machine for CentOS

There’s no GUI and CentOS is small, so don’t create a large virtual machine. These are my guidelines:

  • 2 vCPUs, no reservation. All modern operating systems work noticeably better when they can schedule two threads as opposed to one. You can turn it up later if you’re deployment needs more.
  • Dynamic Memory on; 512MB startup memory, 256MB minimum memory, 1GB maximum memory. You can always adjust Dynamic Memory’s maximum upward, even when the VM is active. Start low.
  • 40GB disk is probably much more than you’ll ever need. I use a dynamically expanding VHDX because there’s no reason not to. The published best practice is to create this with a forced 1 megabyte block size, which must be done in PowerShell. I didn’t do this on my first several Linux VMs and noticed that they do use several gigabytes more space, although still well under 10 apiece. I leave the choice to you.
  • I had troubles using Generation 2 VMs with Ubuntu Server, but I’m having better luck with CentOS. If you use Generation 2 with your CentOS VMs on Hyper-V 2012 R2/8.1 or earlier, remember to disable Secure Boot. If using 2016, you can leave Secure Boot enabled as long as you select the “Microsoft Certification Authority”.
  • If your Hyper-V host is a member of a failover cluster and the Linux VM will be HA, use a static MAC address. Linux doesn’t respond well when its MAC addresses change.

The following is a sample script that you can modify to create a Linux virtual machine in Hyper-V:

If you’re going to use this a lot, I would consider entering some defaults on the parameters so that you don’t need to enter them all each time. For instance, you’re probably not going to move your install ISO often.

You could also use your first installation as the basis for a clone. Use a generic name for the VM/VHDX if that’s your plan.

A Walkthrough of CentOS Installation

When you first boot, it will default to Test this media & install CentOS 7. I typically skip the media check and just Install CentOS Linux 7.

centos_install1

Choose your language:

centos_install2

After selecting the language, you’ll be brought to the Installation Summary screen. Wait a moment for it to detect your environment. As an example, the screenshot shows Not Ready for the Security Policy. It will change to No profile selected once it has completed its checks.

centos_install3

You can work through the items in any order. Anything without the warning triangle can be skipped entirely.

I start with the NETWORK & HOST NAME screen as that can have bearing on other items. When you first access the screen, it will show Disconnected because it hasn’t been configured yet. That’s different behavior from Windows, which will only show disconnected if the virtual network adapter is not connected to a virtual switch.

centos_install4

If you’ll be using DHCP, click the Off slider button at the top right for it to attempt to get an IP. For static or advanced configuration, click the Configure. I’ve shown the IPv4 Settings tab. Fill out that, and the others, as necessary.

centos_install5

Don’t forget to change the host name at the lower left of the networking screen before leaving.

After you’ve set up networking, set the DATE & TIME. If it can detect a network connection, you’ll be allowed to set the Network Time slider to On. Configure as desired.

centos_install6

You must click into the Installation Destination screen or the installer will not allow you to proceed. By default, it will select the entirety of the first hard drive for installation. It will automatically figure out the best way to divide storage. You can override it if you like. If you’re OK with defaults, just click Done.

centos_install8

Explore the other screens as you desire. I don’t set anything else on my systems. At this point, you can click Begin Installation.

centos_install9

While the system installs, you’ll be allowed to set the root password and create the initial user.

centos_install10

As you enter the password for root, the system will evaluate its strength. If it decides that the password you chose isn’t strong enough, you’ll be forced to click Done twice to confirm. The root account is the rough equivalent of the Administrator account on Windows, so do take appropriate steps to secure it with a strong password and exercise care in the keeping of that password.

centos_install11

The user creation screen is straightforward. It has the same password-strength behavior as the root screen.

centos_install12

Now just wait for the installation to complete. Click Reboot. You’ll be left at the login screen of a completely installed CentOS virtual machine:

centos_install13

Assuming that you created a user for yourself and made it administrator, it’s best to log in with that account. Otherwise, you can log in as root. It’s poor practice to use the root account, and even worse to leave the root account logged in.

CentOS Post-Install Wrap-Up for Hyper-V

I have a bit of a chicken-and-egg problem here. You need to do a handful of things to wrap-up, but to do that easily, it helps to know some things about Linux. If you already know about Linux, this will be no problem. Otherwise, just follow along blindly. I’ll explain more afterward. CentOS doesn’t need much, fortunately.

To make this a bit easier, you might want to wait until you’ve met PuTTY. It allows for simple copy/paste actions. Otherwise, you’ll need to type things manually or use the Paste Clipboard feature in the Hyper-V VMCONNECT window. Whatever you choose, just make sure that you follow these steps sooner rather than later.

1. Install Nano

Editing text files is a huge part of the Linux world. It’s also one of the most controversial, bordering on zealotry. “vi” is the editor of choice for a great many. Its power is unrivaled; so is its complexity. I find using vi to be one of the more miserable experiences in all of computing and I refuse to do it when given any choice. Conversely, the nano editor is about as simple as a text-editing tool can be in a character mode world and I will happily use it for everything. Install it as follows:

The command is case-sensitive and you will be prompted for your password if not logged in as root.

2. Enable Dynamic Memory In-Guest

You need to enable the Hot Add feature if you want to use Dynamic Memory with CentOS.

Start by creating a “rules” file. The location is important (/etc/udev/rules.d) but the name isn’t. I’ll just use the same one from Microsoft’s instructions:

You may be prompted for your password.

You’ll now be looking at an empty file in the nano editor. Type or paste the following:

Now press [CTRL]+[X] to exit, then press [Y] to confirm and [Enter] to confirm the filename.

At next reboot, Dynamic Memory will be functional.

3. Install Extra Hyper-V Tools

Most of the tools you need to successfully run Linux on Hyper-V are built into the CentOS distribution. There are a few additional items that you might find of interest:

  • VSS daemon (for online backup)
  • File copy daemon so you can use PowerShell to directly transfer files in from the host
  • KVP daemon for KVP transfers to and from the host

To install them:

4. Change the Disk I/O Scheduler

By default, Linux wants to help optimize disk I/O. Hyper-V also wants to optimize disk I/O. Two optimizers are usually worse than none. Let’s disable CentOS’s.

You must be root for this.

You’ll be prompted for the root password.

The above will change the scheduler to “noop”, which means that CentOS will not attempt to optimize I/O for the primary hard disk. “exit” tells CentOS to exit from the root login back to your login.

Credit for the echo method goes to the authors at nixCraft.

10 Tips for Getting Started with CentOS Linux on Hyper-V

This section is for those with Windows backgrounds. If you already know Linux, you probably won’t get anything out of this section. I will write it from the perspective of a seasoned Windows user. Nothing here should be taken as a slight against Linux.

1. Text Acts Very Differently.

Above all, remember this: Linux is CaSE-SENsiTiVe.

  • yum and Yum are two different things. The first is a command. The second is a mistake.
  • File and directory names must always be typed exactly.

Password fields do not echo anything to the screen.

2. Things Go the Wrong Way

In Windows, you’re used to C: drives and D: drives and SMB shares that start with \\.

In Linux, everything begins with the root, which is just a single /. Absolutely everything hangs off of the root in some fashion. You don’t have a D: drive. Starting from /, you have a dev location, and drives are mounted there. For the SATA virtual drives in your Hyper-V machine, they’ll all be sda, sdb, sdc, etc. So, /dev/sdb would be the equivalent to your Windows D: drive.

Partitions are just numbers appended to the drive. sda1, sda2, etc.

Directory separators are slashes (/) not backslashes (\). A directory that you’ll become familiar with is usr. It lives at /usr.

Moving around the file system should be familiar, as the Windows command line uses similar commands. Linux typically uses ls where Windows uses dir, but CentOS accepts dir. cd and mkdir work as they do on Windows. Use rm to delete things. Use cp to copy things. Use mv to move things.

Running an executable in the folder that you currently occupy by just typing its name does not work. PowerShell behaves the same way, so that may not be strange to you. Use dot and slash to run a script or binary in the same folder:

Linux doesn’t use extensions. Instead, it uses attributes. So, if you create the equivalent of a batch file and then try to execute it, Linux won’t have any idea what you want to do. You need to mark it as executable first. Do so like this:

As you might expect, -x removes the executable attribute.

The default Linux shell does have tab completion, but it’s not the same as what you find on Windows. It will only work for files and directories, for starters. Second, it doesn’t cycle through possibilities the way that PowerShell does. The first tab press works if there is only one way for the completion to work. A second tab press will show you all possible options. You can use other shells with more power than the default, although I’ve never done it.

3. Quick Help is Available

Most commands and applications have a -h and/or a -help parameter that will give you some information on running them. –help is often more detailed than -h. You can sometimes type  man commandname  to get other help (“man” is short for “manual”). It’s not as consistent as PowerShell help, but then PowerShell’s designers got to work with the benefits of hindsight and rigidly controlled design and distribution.

4. You Can Go Home

You’ve got your own home folder, which is the rough equivalent of the “My Documents” folder in Windows. It’s at the universal alias ~. So, cd ~  takes you to your home folder. You can reference files in it with ~/filename.

5. Boss Mode

“root” is the equivalent of “Administrator” on Windows. But, the account you made has nearly the same powers — although not exactly on demand. You won’t have root powers until you specifically ask for them with “sudo”. It’s sort of like “Run as administrator” in Windows, but a lot easier. In fact, the first time you use sudo, the intro text tells you a little bit about it:

centos_sudo

So basically, if you’re going to do something that needs admin powers, you just type “sudo” before the command, just like it says. The first time, it will ask for a password. It will remember it for a while after that. However, 99% of what I do is administrative stuff, so I pop myself into a sudo session that persists until I exit, like this:

You’ll have to enter your password once, and then you’ll be in sudo mode. You can tell that you’re in sudo mode because the dollar sign in te prompt will change to a hash sign:

centos_sudos

I only use Linux for administrative work, so I always use the account with my name on it. However, even when it’s not in sudo mode, it’s still respected as an admin-level account. If you will be using a Linux system as your primary (i.e., you’ll be logged in often), create a non-administrative account to use. You can flip to your admin account or root anytime:

Always respect the power of these accounts.

6. “Exit” Means Never Having to Say Goodbye

People accustomed to GUIs with big red Xs sometimes struggle with character mode environments. “exit” works to end any session. If you’re layered in, as in with sudo or su, you may need to type “exit” a few times. “logout” works in most, but not all contexts.

7. Single-Session is for Wimps

One of the really nifty things about Linux is multiple concurrent sessions. When you first connect, you’re in terminal 1 (tty1). Press [Alt]+[Right Arrow]. Now you’re in tty2! Keep going. 6 wraps back around to 1. [Alt]+[Left Arrow] goes the other way.

You need to be logged in to determine which terminal you’re viewing. Just type tty.

8. Patches, Updates, and Installations, Oh My.

Pretty much all applications and OS components are “packages”. “yum” and “rpm” are your package managers. They’re a bit disjointed, but you can usually find what you need to know with a quick Internet search.

Have your system check to see if updates are available (more accurately, this checks the version data on download sources):

Install package patches and upgrades:

There’s also an “upgrade” option which goes a bit further. Update is safer, upgrade gets more.

Show all installed packages that yum knows about:

The rpm tool shows different results, but for my uses yum is sufficient.

Find a particular installed package, in this case, “hyperv” (spelling/case counts!):

Look for available packages:

Install something (in this case, the Apache web server):

9. System Control

CentOS’s equivalent to Task Manager is top. Type top  at a command prompt and you’ll be taken right to it. Use the up and down arrows and page up and page down to move through the list. Type a question mark [?] to be taken to the help menu that will show you what else you can do. Type [Q] to quit.

10. OK, I’m Done Now

If you’ve used the shutdown command in Windows, then you’ll have little trouble transitioning to Linux. shutdown  tells Linux to shut down gracefully with a 1 minute timer. All active sessions get a banner telling them what’s coming.

Immediate shutdown (my favorite):

Reboot immediately:

There’s an -H switch which, if I’m reading this right, does a hard power off. I don’t use that one.

Useful Tools for CentOS Linux

Manipulating your CentOS environment from the VMConnect console will get tiring quickly. Here are some tools to make managing it much easier.

Text Editors

I already showed you nano. Just type nano at any prompt and press [Enter] and you’ll be in the nano screen. The toolbar at the bottom shows you what key presses are necessary to do things, ex: [CTRL]+[X] to exit. Don’t forget to start it with sudo if you need to change protected files.

The remote text editing tool that I use is Notepad++. It is a little flaky — I sometimes get Access Denied errors with it that I don’t get in any other remote tool (setting it to Active mode seems to help a little). But, the price is hard to beat. If I run into real problems, I run things through my home folder. To connect Notepad++ to your host:

  1. In NPP, go to Plugins->NppFTP->Show NppFTP Window (only click if it’s not checked):
    NPP FTP Window Selector

    NPP FTP Window Selector

     

  2. The NppFTP window pane will appear at the far right. Click the icon that looks like a gear (which is, unfortunately, gray in color so it always looks disabled), then click Profile Settings:NPP FTP Profile Item
  3. In the Profile Settings window, click the Add New button. This will give you a small window where you can provide the name of the profile you’re creating. I normally use the name of the system.
    Add FTP Profile
  4. All the controls will now be activated.
    1. In the Hostname field, enter the DNS name or the IP address of the system you’re connecting to (if you’re reading straight through, you might not know this yet).
    2. Change the Connection type to SFTP.
    3. If you want, save the user name and password. I don’t know how secure this is. I usually enter my name and check Ask for password. If you don’t check that and don’t enter a password, it will assume a blank password.
      NPP FTP Profile

      NPP FTP Profile

       

  5. You can continue adding others or changing anything you like (I suggest going to the Transferstab and setting the mode to Active). Click Close when ready.
  6. To connect, click the Connect icon which will now be blue-ish. It will have a drop-down list where you can choose the profile to connect to.NPP FTP Connect
  7. On your first connection, you’ll have to accept the host’s key:NPP Host Key
  8. If the connection is successful, you’ll attach to your home folder on the remote system. Double-clicking an item will attempt to load it. Using the save commands in NPP will save back to the Linux system directly.NPP FTP Directory

Remember that NPP is a Windows app, and as a Windows app, it wants to save files in Windows format (I know, weird, right?). Windows expects that files encoded in human-readable formats will end lines using a carriage-return character and a linefeed character (CRLF, commonly seen escaped as \r\n). Linux only uses the linefeed character (LF, commonly seen escaped as \n). Some things in Linux will choke if they encounter a carriage return. Any time you’re using NPP to edit a Linux file, go to Edit -> EOL Conversion -> UNIX/OSX Format.

NPP EOL Conversion

NPP EOL Conversion

 

WinSCP

WinSCP allows you to move files back and forth between your Windows machine and a Linux system. It doesn’t have the weird permissions barriers that Notepad++ struggles with, but it also doesn’t have its editing powers.

  1. Download and install WinSCP. I prefer the Commander view but do as you like.
  2. In the Login dialog, highlight New Site and fill in the host’s information:WinSCP Profiles
  3. Click Save to keep the profile. It will present a small dialog asking you to customize how it’s saved. You can change the name or create folders or whatever you like.
  4. With the host entry highlighted, click Login. You’ll be prompted with a key on first connect:WinSCP Key
  5. Upon clicking Yes, you’ll be connected to the home folder. If you get a prompt that it’s listening on FTP, something went awry because the install process we followed does not include FTP. Check the information that you plugged in and try the connection again.
  6. WinSCP integrates with the taskbar for quick launching:WinSCP Taskbar

PuTTY

The biggest tool in your Linux-controlling arsenal will be PuTTY. This gem is an SSH client for Windows. SSH (secure shell) is how you remote control Linux systems. Use it instead of Hyper-V’s virtual machine connection. It’s really just a remote console. PuTTY, however, adds functionality on top of that. It can keep sessions and it gives you dead-simple copy/paste functionality. Highlight text, and it’s copied. Right-click the window, and it’s pasted at the cursor location.

  1. Download PuTTY. I use the installer package myself, but do as you like.
  2. Type in the host name or IP address in that field.PuTTY Profiles
  3. PuTTY doesn’t let you save credentials. But, you can save the session. Type a name for it in the Saved Sessions field and then click Save to add it to the list. Clicking Load on an item, or double-clicking it, will populate the connection field with the saved details.
  4. Click Open when ready. On the first connection, you’ll have to accept the host key:PuTTY Key
  5. You’ll then have to enter your login name and password. Then you’ll be brought to the same type of screen that you saw in the console:PuTTY Console
  6. Right-click the title bar of PuTTY for a powerful menu. The menu items change based on the session status. I have restarted the operating system for the screenshot below so that you can see the Restart Session item. This allows you to quickly reconnect to a system that you dropped from… say, because you restarted it.PuTTY Menu
  7. PuTTY also has taskbar integration:PuTTY Taskbar
  8. When you’re all done, remember to use “exit” to end your session.

Your Journey Has Begun

From here, I leave you to explore your fresh new Linux environment. I’ll be back soon with an article on installing Nagios in CentOS so you can monitor your Hyper-V environment at no cost.

Hyper-V Backup Best Practices: Terminology and Basics

Hyper-V Backup Best Practices: Terminology and Basics

 

One of my very first jobs performing server support on a regular basis was heavily focused on backup. I witnessed several heart-wrenching tragedies of permanent data loss but, fortunately, played the role of data savior much more frequently. I know that most, if not all, of the calamities could have at least been lessened had the data owners been more educated on the subject of backup. I believe very firmly in the value of a solid backup strategy, which I also believe can only be built on the basis of a solid education in the art. This article’s overarching goal is to give you that education by serving a number of purposes:

  • Explain industry-standard terminology and how to apply it to your situation
  • Address and wipe away 1990s-style approaches to backup
  • Clearly illustrate backup from a Hyper-V perspective

Backup Terminology

Whenever possible, I avoid speaking in jargon and TLAs/FLAs (three-/four-letter acronyms) unless I’m talking to a peer that I’m certain has the experience to understand what I mean. When you start exploring backup solutions, you will have these tossed at you rapid-fire with, at most, brief explanations. If you don’t understand each and every one of the following, stop and read those sections before proceeding. If you’re lucky enough to be working with an honest salesperson, it’s easy for them to forget that their target audience may not be completely following along. If you’re less fortunate, it’s simple for a dishonest salesperson to ridiculously oversell backup products through scare tactics that rely heavily on your incomplete understanding.

  • Backup
  • Full/incremental/differential backup
  • Delta
  • Deduplication
  • Inconsistent/crash-consistent/application-consistent
  • Bare-metal backup/restore (BMB/BMR)
  • Disaster Recovery/Business Continuity
  • Recovery point objective (RPO)
  • Recovery time objective (RTO)
  • Retention
  • Rotation — includes terms such as Grandfather-Father-Son (GFS)

There are a number of other terms that you might encounter, although these are the most important for our discussion. If you encounter a vendor making up their own TLAs/FLAs, take a moment to investigate their meaning in comparison to the above. Most are just marketing tactics — inherently harmless attempts by a business entity trying to turn a coin by promoting its products. Some are more nefarious — attempts to invent a nail for which the company just conveniently happens to provide the only perfectly matching hammer (with an extra “value-added” price, of course).

Backup

This heading might seem pointless — doesn’t everyone know what a backup is? In my experience, no. In order to qualify as a backup, you must have a distinct, independent copy of data. A backup cannot have any reliance on the health or well-being of its source data or the media that contains that data. Otherwise, it is not a true backup.

Full/Incremental/Differential Backups

Recent technology changes and their attendant strategies have made this terminology somewhat less popular than in past decades, but it is still important to understand because it is still in widespread use. They are presented in a package because they make the most sense when compared to each other. So, I’ll give you a brief explanation of each and then launch into a discussion.

  • Full Backups: Full backups are the easiest to understand. They are a point-in-time copy of all target data.
  • Differential Backups: A differential backup is a point-in-time copy of all data that is different from the last full backup that is its parent.
  • Incremental Backups: An incremental backup is a point-in-time copy of all data that is different from the backup that is its parent.

The full backup is the safest type because it is the only one of the three that can stand alone in any circumstances. It is a complete copy of whatever data has been selected.

Full Backup

Full Backup

A differential backup is the next safest type. Remember the following:

  • To fully restore the latest data, a differential backup always requires two backups: the latest full backup and the latest differential backup. Intermediary differential backups, if any exist, are not required.
  • It is not necessary to restore from the most recent differential backup if an earlier version of the data is required.
  • Depending on what data is required and the intelligence of the backup application, it may not be necessary to have both backups available to retrieve specific items.

The following is an illustration of what a differential backup looks like:

Differential Backup

Differential Backup

Each differential backup goes all the way back to the latest full backup as its parent. Also, notice that each differential backup is slightly larger than the preceding differential backup. This phenomenon is conventional wisdom on the matter. In theory, each differential backup contains the previous backup’s changes as well as any new changes. In reality, it truly depends on the change pattern. A file backed up on Monday might have been deleted on Tuesday, so that part of the backup certainly won’t be larger. A file that changed on Tuesday might have had half its contents removed on Wednesday, which would make that part of the backup smaller. A differential backup can range anywhere from essentially empty (if nothing changed) to as large as the source data (if everything changed). Realistically, you should expect each differential backup to be slightly larger than the previous.

The following is an illustration of an incremental backup:

Incremental Backup

Incremental Backup

Incremental backups are best thought of as a chain. The above shows a typical daily backup in an environment that uses a weekly full with daily incrementals. If all data is lost and restoring to Wednesday’s backup is necessary, then every single night’s backup from Sunday onward will be necessary. If any one is missing or damaged, then it will likely not be possible to retrieve anything from that backup or any backup afterward. Therefore, incremental backups are the riskiest; they are also the fastest and consume the least amount of space.

Historically, full/incremental/differential backups have been facilitated by an archive bit in Windows. Anytime a file is changed, Windows sets its archive bit. The backup types operate with this behavior:

  • A full backup captures all target files and clears any archive bits that it finds.
  • A differential backup captures only target files that have their archive bit set and it leaves the bit in the state that it found it.
  • An incremental backup captures only files with the archive bit set and clears it afterward.
Archive Bit Example

Archive Bit Example

Delta

“Delta” is probably the most overthought word in all of technology. It means “difference”. Do not analyze it beyond that. It just means “difference”. If you have $10 in your pocket and you buy an item for $8, the $8 dollars that you spent is the “delta” between the amount of money that you had before you made the purchase and the amount of money that you have now.

The way that vendors use the term “delta” sometimes changes, but usually not by a great deal. In the earliest incarnation that I am aware of “delta” as applied to backups, it meant intra-file changes. All previous backup types operated with individual files being the smallest level of granularity (not counting specialty backups such as Exchange item-level). Delta backups would analyze the blocks of individual files, making the granularity one step finer.

The following image illustrates the delta concept:

Delta Backup

Delta Backup

A delta backup is essentially an incremental backup, but at the block level instead of the file level. Somebody got the clever idea to use the word “delta”, probably so that it wouldn’t be confused with “differential”, and the world thought it must mean something extra special because it’s Greek.

The major benefit of delta backups is that they use much less space than even incremental backups. The trade-off is in the computing power to calculate deltas. The archive bit can tell it if a file needs to be scanned, but it cannot tell it which blocks to cover. Backup systems that perform delta operations require some other method for change tracking.

Deduplication

Deduplication represents the latest iteration of backup innovation. The term explains itself quite nicely. The backup application searches for identical blocks of data and reduces them to a single copy.

Deduplication involves three major feats:

  • The algorithm that discovers duplicate blocks must operate in a timely fashion
  • The system that tracks the proper location of duplicated blocks must be foolproof
  • The system that tracks the proper location of duplicated blocks must use significantly less storage than simply keeping the original blocks

So, while deduplication is conceptually simple, implementations can depend upon advanced computer science.

Deduplication’s primary benefit is that it can produce backups that are even smaller than delta systems. Part of that will depend on the overall scope of the deduplication engine. If you were to run fifteen new Windows Server 2016 virtual machines through even a rudimentary deduplicator, it would reduce all of them to the size of approximately a single Windows Server 2016 virtual machine — a 93% savings.

There is risk in overeager implementations, however. With all data blocks represented by a single copy, each block becomes a single point of failure. The loss of a single vital block could spell disaster for a backup set. This risk can be mitigated by employing a single pre-existing best practice: always maintain multiple backups.

Inconsistent/Crash-Consistent/Application-Consistent

We already have an article set that explores these terms in some detail. Quickly:

  • Inconsistent backups would be effectively the same thing as performing a manual file copy of a directory tree.
  • Crash-consistent backup captures data as it sits on the storage volume at a given point in time, but cannot touch anything passing through the CPU or waiting in memory. You could lose any in-flight I/O operations.
  • Application-consistent backup coordinates with the operating system and, where possible, individual applications to ensure that in-flight I/Os are flushed to disk so that there are no active file changes at the moment that the backup is taken

I occasionally see people twisting these terms around, although I believe that’s most accidental. The definitions that I used above have been the most common, stretching back into the 90s. Be aware that there are some disagreements, so ensure that you clarify terminology with any salespeople.

Bare-Metal Backup/Restore

A so-called “bare-metal backup” and/or “bare metal restore” involves capturing the entirety of a storage unit including metadata portions such as the boot sector. These backup/restore types essentially mean that you could restore data to a completely empty physical system without needing to install an operating system and/or backup agent on it first.

Disaster Recovery/Business Continuity

The terms “Disaster Recovery” (DR) and “Business Continuity” are often used somewhat interchangeably in marketing literature. “Disaster Recovery” is the older term and more accurately reflects the nature of the involved solutions. “Business Continuity” is a newer, more exciting version that sounds more positive but mostly means the same thing. These two terms encompass not just restoring data, but restoring the organization to its pre-disaster state. “Business Continuity” is used to emphasize the notion that, with proper planning, disasters can have little to no effect on your ability to conduct normal business. Of course, the more “continuous” your solution is, the higher your costs are. That’s not necessarily a bad thing, but it must be understood and expected.

One thing that I really want to make very clear about disaster recovery and/or business continuity is that these terms extend far beyond just backing up and restoring your data. DR plans need to include downtime procedures, phone trees, alternative working sites, and a great deal more. You need to think all the way through a disaster from the moment that one occurs to the moment that everything is back to some semblance of normal.

Recovery Point Objective

The maximum acceptable span of time between the latest backup and a data loss event is called a recovery point objective (RPO). If the words don’t sound very much like their definition, that’s because someone worked really hard to couch a bad situation within a somewhat neutral term. If it helps, the “point” in RPO means “point in time.” Of all data adds and changes, anything that happens between backup events has the highest potential of being lost. Many technologies have some sort of fault tolerance built in; for instance, if your domain controller crashes and it isn’t due to a completely failed storage subsystem, you’re probably not going to need to go to backup. Most other databases can tell a similar story. RPOs mostly address human error and disaster. More common failures should be addressed by technology branches other than backup, such as RAID.

A long RPO means that you are willing to lose a greater span of time. A daily backup gives you a 24-hour RPO. Taking backups every two hours results in a 2-hour RPO. Remember that an RPO represents a maximum. It is highly unlikely that a failure will occur immediately prior to the next backup operation.

Recovery Time Objective

Recovery time objective (RTO) represents the maximum amount of time that you are willing to wait for systems to be restored to a functional state. This term sounds much more like its actual meaning than RPO. You need to take extra care when talking with backup vendors about RTO. They will tend to only talk about RTO in terms of restoring data to a replacement system. If your primary site is your only site and you don’t have a contingency plan for complete building loss, your RTO is however long it takes to replace that building, fill it with replacement systems, and restore data to those systems. Somehow, I suspect that a six-month or longer RTO is unacceptable for most institutions. That is one reason that DR planning must extend beyond taking backups.

In more conventional usage, RTOs will be explained as though there is always a target system ready to receive the restored data. So, if your backup drives are taken offsite to a safety deposit box by the bookkeeper when she comes in at 8 AM, your actual recovery time is essentially however long it takes someone to retrieve the backup drive plus the time needed to perform a restore in your backup application.

Retention

Retention is the desired amount of time that a backup should be kept. This deceptively simple description hides some complexity. Consider the following:

  • Legislation mandates a ten-year retention policy on customer data for your industry. A customer was added in 2007. Their address changed in 2009. Must the customer’s data be kept until 2017 or 2019?
  • Corporate policy mandates that all customer information be retained for a minimum of five years. The line-of-business application that you use to record customer information never deletes any information that was placed into it and you have a copy of last night’s data. Do you need to keep the backup copy from five years ago or is having a recent copy of the database that contains five-year-old data sufficient?

Questions such as these can plague you. Historically, monthly and annual backup tapes were simply kept for a specific minimum number of years and then discarded, which more or less answered the question for you. Tape is an expensive solution, however, and many modern small businesses do not use it. Furthermore, laws and policies only dictate that the data be kept; nothing forced anyone to ensure that the backup tapes were readable after any specific amount of time. One lesson that many people learn the hard way is that tapes stored flat can lose data after a few years. We used to joke with customers that their bits were sliding down the side of the tape. I don’t actually understand the governing electromagnetic phenomenon, but I can verify that it does exist.

With disk-based backups, the possibilities are changed somewhat. People typically do not keep stacks of backup disks lying around, and their ability to hold data for long periods of time is not the same as backup tape. The rules are different — some disks will outlive tape, others will not.

Rotation

Backup rotations deal with the media used to hold backup information. This has historically meant tape, and tape rotations often came in some very grandiose schemes. One of the most widely used rotations is called “Grandfather-Father-Son” (GFS):

  • One full backup is taken monthly. The media it is taken on is kept for an extended period of time, usually one year. One of these is often considered an annual and kept longer. This backup is called the “Grandfather”.
  • Each week thereafter, on the same day, another full backup is taken. This media is usually rotated so that it is re-used once per month. This backup is known as the “Father”.
  • On every day between full backups, an incremental backup is taken. Each day’s media is rotated so that it is re-used on the same day each week. This backup is known as the “Son”.

The purpose of rotation is to have enough backups to provide sufficient possible restore points to guard against a myriad of possible data loss instances without using so much media that you bankrupt yourself and run out of physical storage room. Grandfathers are taken offsite and placed in long-term storage. Fathers are taken offsite, but perhaps not placed in long-term storage so that they are more readily accessible. Sons are often left onsite, at least for a day or two, to facilitate rapid restore operations.

Replacing Old Concepts with New Best Practices

Some backup concepts are simply outdated, especially for the small business. Tape used to be the only feasible mass storage device that could be written and rewritten on a daily basis and were sufficiently portable. I recall being chastised by a vendor representative in 2004 because I was “still” using tape when I “should” be backing up to his expensive SAN. I asked him, “Oh, do employees tend to react well when someone says, ‘The building is on fire! Grab the SAN and get out!’?” He suddenly didn’t want talk to me anymore.

The other somewhat outdated issue is that backups used to take a very, very long time. Tape was not very fast, disks were not very fast, networks were not very fast. Differential and incremental backups were partly the answer to that problem, and partly to the problem that tape capacity was an issue. Today, we have gigantic and relatively speedy portable hard drives, networks that can move at least many hundreds of megabits per second, and external buses like USB 3 that outrun both of those things. We no longer need all weekend and an entire media library to perform a full backup.

One thing that has not changed is the need for backups to exist offsite. You cannot protect against a loss of a building if all of your data stays in that building. Solutions have evolved, though. You can now afford to purchase large amounts of bandwidth and transmit your data offsite to your alternative business location(s) each night. If you haven’t got an alternative business location, there are an uncountable number of vendors that would be happy to store your data each night in exchange for a modest (or not so modest) sum of money. I still counsel periodically taking an offline offsite backup copy, as that is a solid way to protect your organization against malicious attacks (some of which can be by disgruntled staff).

These are the approaches that I would take today that would not have been available to me a few short years ago:

  • Favor full backups whenever possible — incremental, differential, delta, and deduplicated backups are wonderful, but they are incomplete by nature. It must never be forgotten that the strength of backup lies in the fact that it creates duplicates of data. Any backup technique that reduces duplication dilutes the purpose of backup. I won’t argue against anyone saying that there are many perfectly valid reasons for doing so, but such usage must be balanced. Backup systems are larger and faster than ever before; if you can afford the space and time for full copies, get full copies.
  • Steer away from complicated rotation schemes like GFS whenever possible. Untrained staff will not understand them and you cannot rely on the availability of trained staff in a crisis.
  • Encrypt every backup every time.
  • Spend the time to develop truly meaningful retention policies. You can easily throw a tape in a drawer for ten years. You’ll find that more difficult with a portable disk drive. Then again, have you ever tried restoring from a ten-year-old tape?
  • Be open to the idea of using multiple backup solutions simultaneously. If using a combination of applications and media types solves your problem and it’s not too much overhead, go for it.

There are a few best practices that are just as applicable now as ever:

  • Periodically test your backups to ensure that data is recoverable
  • Periodically review what you are backing up and what your rotation and retention policies are to ensure that you are neither shorting yourself on vital data nor wasting backup media space on dead information
  • Backup media must be treated as vitally sensitive mission-critical information and guarded against theft, espionage, and damage
    • Magnetic media must be kept away from electromagnetic fields
    • Tapes must be stored upright on their edges
    • Optical media must be kept in dark storage
    • All media must be kept in a cool environment with a constant temperature and low humidity
  • Never rely on a single backup copy. Media can fail, get lost, or be stolen. Backup jobs don’t always complete.

Hyper-V-Specific Backup Best Practices

I want to dive into the nuances of backup and Hyper-V more thoroughly in later articles, but I won’t leave you here without at least bringing them up.

  • Virtual-machine-level backups are a good thing. That might seem a bit self-serving since I’m writing for Altaro and they have a virtual-machine-level backup application, but I fit well here because of shared philosophy. A virtual-machine-level backup gives you the following:
    • No agent installed inside the guest operating system
    • Backups are automatically coordinated for all guests, meaning that you don’t need to set up some complicated staggered schedule to prevent overlaps
    • No need to reinstall guest operating systems separately from restoring their data
  • Hyper-V versions prior to 2016 do not have a native changed block tracking mechanism, so virtual-machine-level backup applications that perform delta and/or deduplication operations must perform a substantial amount of processing. Keep that in mind as you are developing your rotations and scheduling.
  • Hyper-V will coordinate between backup applications that run at the virtual-machine-level (like Altaro VM) and the VSS writer(s) within guest Windows operating systems and the integration components within Linux guest operating systems. This enables application-consistent backups without doing anything special other than ensuring that the integration components/services are up-to-date and activated.
  • For physical installations, no application can perform a bare metal restore operation any more quickly than you can perform a fresh Windows Server/Hyper-V Server installation from media (or better yet, a WDS system). Such a physical server should only have very basic configuration and only backup/management software installed. Therefore, backing up the management operating system is typically a completely pointless endeavor. If you feel otherwise, I want to know what you installed in the management operating system that would make a bare-metal restore worth your time, as I’m betting that such an application or configuration should not be in the management operating system at all.
  • Use your backup application’s ability to restore a virtual machine next to its original so that you can test data integrity

Follow-Up Articles

With the foundational material supplied in this article, I intend to work on further posts that expand on these thoughts in greater detail. If you have any questions or concerns about backing up Hyper-V, let me know. Anything that I can’t answer quickly in comments might find its way into an article.

PowerShell Script: Change Advanced Settings of Hyper-V Virtual Machines

PowerShell Script: Change Advanced Settings of Hyper-V Virtual Machines

 

Each Hyper-V virtual machine sports a number of settings that can be changed, but not by any sanctioned GUI tools. If you’re familiar with WMI, these properties are part of the Msvm_VirtualSystemSettingData class. Whether you’re familiar with WMI or not, these properties are not simple to change. I previously created a script that modifies the BIOS GUID setting, but that left out all the other available fields. So, I took that script back into the workshop and rewired it to increase its reach.

If you’re fairly new to using PowerShell as a scripting language and use other people’s scripts to learn, there are some additional notes after the script contents that you might be interested in.

What this Script Does

This script can be used to modify the following properties of a Hyper-V virtual machine:

  • BIOS GUID: The BIOS of every modern computer should contain a Universally Unique Identifier (UUID). Microsoft’s implementation of the standard UUID is called a Globally Unique Identifier (GUID).  You can see it on any Windows computer with gwmi Win32_ComputerSystem | select UUID. On physical machines, I don’t believe that it’s possible to change the UUID. On virtual machines, it is possible and even necessary at times. You can provide your own GUID or specify a new one.
  • Baseboard serial number.
  • BIOS serial number.
  • Chassis asset tag.
  • Chassis serial number.

There are other modifiable fields on this particular WMI class, but these are the only fields that I’m certain have no effect on the way that Hyper-V handles the virtual machine.

Warning 1: Changes to these fields are irreversible without restoring from backup. Modification of the BIOS GUID field is likely to trigger software activation events. Other side effects, potentially damaging, may occur. Any of these fields may be tracked and used by software inside the virtual machine. Any of these fields may be tracked and used by third-party software that manipulates the virtual machine. Use this script at your own risk.

Warning 2: These settings cannot be modified while the virtual machine is on. It must be in an Off state (not Saved or Paused). This script will turn off a running virtual machine (you are prompted first). It will not change anything on saved or paused VMs.

Warning 3: Only the active virtual machine is impacted. If the virtual machine has any checkpoints, they are left as-is. That means that if you delete the checkpoints, the new settings will be retained. If you apply or revert to a checkpoint, the old settings will be restored. I made the assumption that this behavior would be expected.

The following safety measures are in place:

  • The script is marked as High impact, which means that it will prompt before doing anything unless you supply the -Force parameter or have your confirmation preference set to a dangerous level. It will prompt up to two times: once if the virtual machine is running (because the VM must be off before the change can occur) and when performing the change.
  • The script will only accept a single virtual machine at a time. Of course, it can operate within a foreach block so this barrier can be overcome. The intent was to prevent accidents.
  • If a running virtual machine does not shut down within the allotted time, the script exits. The default wait time is 5 minutes, overridable by specifying the Timeout parameter. The timeout is measured in seconds. If the virtual machine’s guest shutdown process was properly triggered, it will continue to attempt to shut down and this script will not try to turn it back on.
  • If a guest’s shutdown integration service does not respond (which includes guests that don’t have a shutdown integration service) the script will exit without making changes. If you’re really intent on making changes no matter what, I’d use the built-in Stop-VM cmdlet first.

Script Requirements

The script is designed to operate via WMI, so the Hyper-V PowerShell module is not required. However, it can accept output from Get-VM for its own VM parameter.

You must have administrative access on the target Hyper-V host. The script does not check for this status because you might be targeting a remote computer. I did not test the script with membership in “Hyper-V Administrators”. That group does not always behave as expected, but it might work.

Set-VMAdvancedSettings

Copy/paste the contents of the code block to a text editor on your system and save the file as Set-VMAdvancedSettings.ps1. As-is, you call the script directly. If you uncomment the first two lines and the last line, you will convert the script to an advanced function that can be dot-sourced or added to your profile.

Script Notes for Other Scripters

I hope that at least some of you are using my scripts to advance your own PowerShell knowledge. This script shows two internal functions used for two separate purposes.

Code/Script Reusability

Sometimes, you’ll find yourself needing the same functionality in multiple scripts. You could rewrite that functionality each time, but I’m sure you don’t need me to tell you what’s wrong with that. You could put that functionality into another script and dot-source it into each of the others that rely on it. That’s a perfectly viable approach, but I don’t use it in my public scripts because I can’t guarantee what scripts will be on readers’ systems and I want to provide one-stop wherever possible. That leads to the third option, which is reusability.

Find the line that says function Process-WmiJob. As the comments say, I have adapted that from someone else’s work. I commonly need to use its functionality in many of my WMI-based scripts. So, I modularized it to use a universal set of parameters. Now I just copy/paste it into any other script that uses WMI jobs.

Making your code/script reusable can save you a great deal of time. Reused blocks have predictable results. The more you use them, the more likely you are to work out long-standing bugs and fine-tune them.

The problem with reused blocks is that they become disjointed. If I fix a problem that I find in the function within this script, I might or might not go back and update it everywhere else that I use it. In your own local scripting, you can address that problem by having a single copy of a script dot-sourced into all of your others. However, if each script file has its own copy of the function, it’s easier to customize it when necessary.

There’s no “right” answer or approach when it comes to code/script reusability. The overall goal is to reduce duplication of work, not to make reusable blocks. Never be so bound up in making a block reusable that you end up doing more overall work.

Don’t Repeat Yourself

A lofty goal related to code/script reusability is the Don’t Repeat Yourself principle (DRY). As I was reworking the original version of this script, I found that I was essentially taking the script block for the previous virtual machine property, copy/pasting it, and then updating the property and variable names. There was a tiny bit of customization on a few of them, but overall the blocks were syntactically identical. The script worked, and it was easy to follow along, but that’s not really an efficient way to write a script. Computers are quite skilled at repeating tasks. Humans, on the contrary, quickly tire of repetition. Therefore, it only makes sense to let the computer do whatever you’d rather skip.

DRY also addresses the issue of tiny mistakes. Let’s say that I duplicated the script for changing the ChassisSerialNumber, but left in the property name for the ChassisAssetTag. That would mean that your update would result in the ChassisAssetTag that you specified being applied to the ChassisAssetTag field and the ChassisSerialNumber. These types of errors are extremely common when you copy/paste/modify blocks.

Look at the line that says Change-VMSetting. That contains a fairly short bit of script that changes the properties of an object. I won’t dive too deeply into the details; the important part is that this particular function might be called up to five times during each iteration of the script. It’s only typed once, though. If I (or you) find a bug in it, there’s only place that I need to make corrections.

Internal Functions in Your Own Scripts

Notice that I put my functions into a begin {} block. If this script is on the right side of a pipe, then its process {} block might be called multiple times. The functions only need to be defined once. Leaving them in the begin block provides a minor performance boost because that part of the script won’t need to be parsed on each pass.

I also chose to use non-standard verbs “Process” and “Change” for the functions. That’s because I can never be entirely certain about function names that might already be in the global namespace or that might be in other scripts that include mine. Programming languages tend to implement namespaces to avoid such naming collisions, but PowerShell does not have that level of namespace support just yet. Keep that problem in mind when writing your own internal functions.

Critical Status in Hyper-V Manager

Critical Status in Hyper-V Manager

 

 

I’m an admitted technophile. I like blinky lights and sleek chassis and that new stuff smell and APIs and clicking through interfaces. I wouldn’t be in this field otherwise. However, if I were to compile a list of my least favorite things about unfamiliar technology, that panicked feeling when something breaks would claim the #1 slot. I often feel that systems administration sits diametrically opposite medical care. We tend to be comfortable learning by poking and prodding at things while they’re alive. When they’re dead, we’re sweating — worried that anything we do will only make the situation worse. For many of us in the Hyper-V world, that feeling first hits with the sight of a virtual machine in “Critical” status.

If you’re there, I can’t promise that the world hasn’t ended. I can help you to discover what it means and how to get back on the road to recovery.

The Various “Critical” States in Hyper-V Manager

If you ever look at the underlying WMI API for Hyper-V, you’ll learn that virtual machines have a long list of “sick” and “dead” states. Hyper-V Manager distills these into a much smaller list for its display. If you have a virtual machine in a “Critical” state, you’re only given two control options: Connect and Delete:

crit_sample

We’re fortunate enough in this case that the Status column gives some indication as to the underlying problem. That’s not always the case. That tiny bit of information might not be enough to get you to the root of the problem.

For starters, be aware that any state that includes the word “Critical” typically means that the virtual machine’s storage location has a problem. The storage device might have failed. The host may not be able to connect to storage. If you’re using SMB 3, the host might be unable to authenticate.

You’ll notice that there’s a hyphen in the state display. Before the hyphen will be another word that indicates the current or last known power state of the virtual machine. In this case, it’s Saved. I’ve only ever seen three states:

  • Off-Critical: The virtual machine was off last time the host was able to connect to it.
  • Saved-Critical: The virtual machine was in a saved state the last time the host was able to connect to it.
  • Paused-Critical: The paused state typically isn’t a past condition. This one usually means that the host can still talk to the storage location, but it has run out of free space.

There may be other states that I have not discovered. However, if you see the word “Critical” in a state, assume a storage issue.

Learning More About the Problem

If you have a small installation, you probably already know enough at this point to go find out what’s wrong. If you have a larger system, you might only be getting started. With only Connect and Delete, you can’t find out what’s wrong. You need to start by discovering the storage location that’s behind all of the fuss. Since Hyper-V Manager won’t help you, it’s PowerShell to the rescue:

Remember to use your own virtual machine name for best results. The first of those two lines will show you all of the virtual machine’s properties. It’s easier to remember in a pinch, but it also displays a lot of fields that you don’t care about. The second one pares the output list down to show only the storage-related fields. My output:

crit_powershellexample

The Status field specifically mentioned the configuration location. As you can see, the same storage location holds all of the components of this particular virtual machine. We are not looking at anything related to the virtual hard disks, though. For that, we need a different cmdlet:

Again, I recommend that you use the name of your virtual machine instead of mine. The first cmdlet will show a table display that includes the path of the virtual hard disk file, but it will likely be truncated. There’s probably enough to get you started. If not, the second shows the entire path.

crit_samplepsharddriveexample

Everything that makes up this virtual machine happens to be on the same SMB 3 share. If yours is on a SCSI target, use iscsicpl.exe to check the status of connected disks. If you’re using Fibre Channel, your vendor’s software should be able to assist you.

Correcting the Problem

In my case, the Server service was stopped on the system that I use to host SMB 3 shares. It got that way because I needed to set up a scenario for this article. To return the virtual machine to a healthy state, I only needed to start that service and wait a few moments.

Your situation will likely be different from mine, of course. Your first goal is to rectify the root of the problem. If the storage is offline, bring it up. If there’s a disconnect, reconnect. After that, simply wait. Everything should take care of itself.

When I power down my test cluster, I tend to encounter this issue upon turning everything back on. I could start my storage unit first, but the domain controllers are on the Hyper-V hosts so nothing can authenticate to the storage unit even if it’s on. I could start the Hyper-V hosts first, but then the storage unit isn’t there to challenge authentication. So, I just power the boxes up in whatever order I come to them. All I need to do is wait — the Hyper-V hosts will continually try to reach storage, and they’ll eventually be successful.

If the state does not automatically return to a normal condition, restart the “Hyper-V Virtual Machine Management” service. You’ll find it by that name in the Services control panel applet. In an elevated PowerShell session:

At an administrative command prompt:

That should clear up any remaining status issues. If it doesn’t, there is still an issue communicating with storage. Or, in the case of the Paused condition, it still doesn’t believe that the location has sufficient space to safely run the virtual machine(s).

Less Common Corrections

If you’re certain that the target storage location does not have issues and the state remains Critical, then I would move on to repairs. Try chkdsk. Try resetting/rebooting the storage system. It’s highly unlikely that the Hyper-V host is at fault, but you can also try rebooting that.

Sometimes, the constituent files are damaged or simply gone. Make sure that you can find the actual .xml (2012 R2 and earlier) or .vmcx (2016 and later) file that represents the virtual machine. Remember that it’s named with the virtual machine’s unique identifier. You can find that with PowerShell:

If the files are misplaced or damaged, your best option is restore. If that’s not an option, then Delete might be your only choice. Delete will remove any remainders of the virtual machine’s configuration files, but will not touch any virtual hard disks that belong to the virtual machine. You can create a new one and reattach those disk files.

Best of luck to you.

Performance Impact of Hyper-V CPU Compatibility Mode

Performance Impact of Hyper-V CPU Compatibility Mode

 

If there’s anything in the Hyper-V world that’s difficult to get good information on, it’s the CPU compatibility setting. Very little official documentation exists, and it only tells you why and how. I, for one, would like to know a bit more about the what. That will be the focus of this article.

What is Hyper-V CPU Compatibility Mode?

Hyper-V CPU compatibility mode is a per-virtual machine setting that allows Live Migration to a physical host running a different CPU model (but not manufacturer). It performs this feat by masking the CPU’s feature set to one that exists on all CPUs that are capable of running Hyper-V. In essence, it prevents the virtual machine from trying to use any advanced CPU instructions that may not be present on other hosts.

Does Hyper-V’s CPU Compatibility Mode Impact the Performance of My Virtual Machine?

If you want a simple and quick answer, then: probably not. The number of people that will be able to detect any difference at all will be very low. The number of people that will be impacted to the point that they need to stop using compatibility mode will be nearly non-existent. If you use a CPU benchmarking tool, then you will see a difference, and probably a marked one. If that’s the only way that you can detect a difference, then that difference does not matter.

I will have a much longer-winded explanation, but I wanted to get that out of the way first.

How Do I Set CPU Compatibility Mode?

Luke wrote a thorough article on setting Hyper-V’s CPU compatibility mode. You’ll find your answer there.

A Primer on CPU Performance

For most of us in the computer field, CPU design is a black art. It requires understanding of electrical engineering, a field that combines physics and logic. There’s no way you’ll build a processor if you can’t comprehend both how a NAND gate functions and why you’d want it to do that. It’s more than a little complicated. Therefore, most of us have settled on a few simple metrics to decide which CPUs are “better”. I’m going to do “better” than that.

CPU Clock Speed

Clock speed is typically the first thing that people generally want to know about a CPU. It’s a decent bellwether for its performance, although an inaccurate one.

A CPU is a binary device. Most people interpret that to mean that a CPU operates on zeros and ones. That’s conceptually accurate but physically untrue. A CPU interprets electrical signals above a specific voltage threshold as a “one”; anything below that threshold is a “zero”. Truthfully speaking, even that description is wrong. The silicon components inside a CPU will react one way when sufficient voltage is present and a different way in the absence of such voltage. To make that a bit simpler, if the result of an instruction is “zero”, then there’s little or no voltage. If the result of an instruction is “one”, then there is significantly more voltage.

Using low and high voltages, we solve the problem of how a CPU functions and produces results. The next problem that we have is how to keep those instructions and results from running into each other. It’s often said that “time is what keeps everything from happening at once”. That is precisely the purpose of the CPU clock. When you want to send an instruction, you ensure that the input line(s) have the necessary voltages at the start of a clock cycle. When you want to check the results, you check the output lines at the start of a clock cycle. It’s a bit more complicated than that, and current CPUs time off of more points than just the beginning of a clock cycle, but that’s the gist of it.

cpucompat_clockcycles

From this, we can conclude that increasing the clock speed gives us more opportunities to input instructions and read out results. That’s one way that performance has improved. As I said before, though, clock speed is not the most accurate predictor of performance.

Instructions per Cycle

The clock speed merely sets how often data can be put into or taken out of a CPU. It does not directly control how quickly a CPU operates. When a CPU “processes” an instruction, it’s really just electrons moving through logic gates. The clock speed can’t make any of that go any more quickly. It’s too easy to get bogged down in the particulars here, so we’ll just jump straight to the end: there is no guarantee that a CPU will be able to finish any given instruction in a clock cycle. That’s why overclockers don’t turn out earth-shattering results on modern CPUs.

That doesn’t mean that the clock speed isn’t relevant. It’s common knowledge that an Intel 386 performed more instructions per cycle than a Pentium 4. However, the 386 topped out at 33Mhz whereas the Pentium 4 started at over 1 GHz. No one would choose to use a 386 against a Pentium 4 when performance matters. However, when the clock speeds of two different chips are closer, internal efficiency trumps clock speed.

Instruction Sets

Truly exploring the depths of “internal efficiency” would take our little trip right down Black Arts Lane. I only have a 101-level education in electrical engineering so I certainly will not be a chauffeur on that trip. However, the discussion includes instructions sets, which is a very large subtopic that is directly related to the subject of interest in this article.

CPUs operate with two units: instructions and data. Data is always data, but if you’re a programmer, you probably use the term “code”. “Code” goes through an interpreter or a compiler which “decodes” it into an instruction. Every CPU that I’ve ever worked with understood several common instructions: PUSH, POP, JE, JNE, EQ, etc. (for the sake of accuracy, those are actually codes, too, but I figure it’s better than throwing a bunch of binary at you). All of these instructions appear in the 80×86 (often abbreviated as x86) and the AMD64 (often abbreviated as x64) instruction sets. If you haven’t figured it out by now, an instruction set is just a gathering of CPU instructions.

If you’ve been around for a while, you’ve probably at least heard the acronyms “CISC” and “RISC”. They’re largely marketing terms, but they have some technical merit. These acronyms stand for:

  • CISC: Complete Instruction Set Computer
  • RISC: Reduced Instruction Set Computer

In the abstract, a CISC system has all of the instructions available. A RISC system has only some of those instructions available. RISC is marketed as being faster than CISC, based on these principles:

  • I can do a lot of adding and subtracting more quickly than you can do a little long division.
  • With enough adding and substracting, I have nearly the same outcome as your long division.
  • You don’t do that much long division anyway, so what good is all of that extra infrastructure to enable long division?

On the surface, the concepts are sound. In practice, it’s muddier. Maybe I can’t really add and subtract more quickly than you can perform long division. Maybe I can, but my results are so inaccurate that my work constantly needs to be redone. Maybe I need to do long division a lot more often than I thought. Also, there’s the ambiguity of it all. There’s really no such thing as a “complete” instruction set; we can always add more. Does a “CISC” 80386 become a “RISC” chip when the 80486 debuts with a larger instruction set? That’s why you don’t hear these terms anymore.

Enhanced Instruction Sets and Hyper-V Compatibility Mode

We’ve arrived at a convenient segue back to Hyper-V. We don’t think much about RISC vs. CISC, but that’s not the only instruction set variance in the world. Instructions sets grow because electrical engineers are clever types and they tend to come up with new tasks, quicker ways to do old tasks, and ways to combine existing tasks for more efficient results. They also have employers that need to compete with other employers that have their own electrical engineers doing the same thing. To achieve their goals, engineers add instructions. To achieve their goals, employers bundle the instructions into proprietary instruction sets. Even the core x86 and x64 instruction sets go through revisions.

When you Live Migrate a virtual machine to a new host, you’re moving active processes. The system already initialized those processes to a particular instruction set. Some applications implement logic to detect the available instruction set, but no one checks it on the fly. If that instruction set were to change, your Live Migration would quickly become very dead. CPU compatibility mode exists to address that problem.

The Technical Differences of Compatibility Mode

If you use a CPU utility, you can directly see the differences that compatibility mode makes. These screen shot sets were taken of the same virtual machine on AMD and Intel systems, first with compatibility mode off, then with compatibility mode on.

cpucompat_amdcompat

cpucompat_intelcompat

The first thing to notice is that the available instruction set list shrinks just by setting compatibility mode, but everything else stays the same.

The second thing to notice is that the instruction sets are always radically different between an AMD system and an Intel system. That’s why you can’t Live Migrate between the two even with compatibility mode on.

Understanding Why CPU Compatibility Mode Isn’t a Problem

I implied in an earlier article that good systems administrators learn about CPUs and machine instructions and code. This is along the same lines, although I’m going to take you a bit deeper, to a place that I have little expectation that many of you would go on your own. My goal is to help you understand why you don’t need to worry about CPU compatibility mode.

There are two generic types of software application developers/toolsets:

  • Native/unmanaged: Native developers/toolsets work at a relatively low level. Their languages of choice will be assembler, C, C++, D, etc. The code that they write is built directly to machine instructions.
  • Interpreted/managed: The remaining developers use languages and toolsets whose products pass through at least one intermediate system.Their languages of choice will be Java, C#, Javascript, PHP, etc. Those languages rely on external systems that are responsible for translating the code into machine instructions as needed, often on the fly (Just In Time, or “JIT”).

These divisions aren’t quite that rigid, but you get the general idea.

Native Code and CPU Compatibility

As a general rule, developing native code for enhanced CPU instruction sets is a conscious decision made twice. First, you must instruct your compiler to use these sets:

There might be some hints here about one of my skunkworks projects

These are just the extensions that Visual Studio knows about. For anything more, you’re going to need some supporting files from the processor manufacturer. You might even need to select a compiler that has support built-in for those enhanced sets.

Second, you must specifically write code that calls on instructions from those sets. SSE code isn’t something that you just accidentally use.

Interpreted/Managed Code and CPU Compatibility

When you’re writing interpreted/managed code, you don’t (usually) get to decide anything about advanced CPU instructions. That’s because you don’t compile that kind of code to native machine instructions. Instead, a run-time engine will operate your code. In the case of scripting languages, that happens on the fly. For languages like Java and C#, they are first compiled (is that the right word for Java?) into some sort of intermediate format. Java becomes byte code; C# becomes Common Intermediate Language (CIL) and then byte code. They are executed by an interpreter.

It’s the interpreter that has the option of utilizing enhanced instruction sets. I don’t know if any of them can do that, but these interpreters all run on a wide range of hardware. That ensures that their developers are certainly verifying the existence of any enhancements that they intend to use.

What These Things Mean for Compatibility

What this all means is that even if you don’t know if CPU compatibility affects the application that you’re using, the software manufacturer should certainly know. If the app requires the .Net Framework, then I would not be concerned at all. If it’s native/unmanaged code, the manufacturer should have had the foresight to list any required enhanced CPU capabilities in their requirements documentation.

In the absence of all other clues, these extensions are generally built around boosting multimedia performance. Video and audio encoding and decoding operations feature prominently in these extensions. If your application isn’t doing anything like that, then the odds are very low that it needs these extensions.

What These Things Do Not Mean for Compatibility

No matter what, your CPU’s maximum clock speed will be made available to your virtual machines. There is no throttling, there is no cache limiting, there is nothing other than a reduction of the available CPU instruction sets. Virtual machine performance is unlikely to be impacted at all.

Page 5 of 28« First...34567...1020...Last »