Extending Hyper-V’s Guest Grace Period on Host Shutdown21 Jun 2017 by 0
When a Hyper-V host shuts down, how long does it wait for its virtual machines to shut down or save? Did you say five minutes? That’s what I said too! Well, we’re wrong.
Where that Five Minute Answer Comes From
When you tell Hyper-V Manager to shut down a guest, it waits five minutes before giving up. I’m not sure exactly where I heard that first, but I know it was from an authoritative source, like one of the Hyper-V program managers. I, and no doubt others, then extrapolated that to mean that five minutes is the timeout period for virtual machine shut down.
However, there’s nothing to support that. If you look at the API, the Hyper-V management service ignores any supplied timeout value. That means that any program that calls it can wait as long as it wants to or it can just issue the command and let it run forever. Hyper-V Manager waits five minutes because its developers coded it to do that. The host shut down process does not use Hyper-V Manager, though.
I didn’t put all that together until I saw some reports on the forums about virtual machines suffering hard shut downs during host shut down. Responders tried to help with the bit about five minutes. That has never seemed to help anyone.
Changing the VM Shutdown Timeout Via Regedit
On a Hyper-V host (tested on 2012R2 and 2016), open up Regedit.exe (works on Hyper-V Server and Windows Server Core, too). Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization. Find the ShutdownTimeout value.
Changing the VM Shutdown Timeout Via Regedit
I highly recommend that you spend some time tinkering with a test host or a single guinea pig production host before pushing this out to all of your hosts.
Follow these steps to modify the registry key in Group Policy:
- On a system with the necessary console installed, open Group Policy Management Console.
- Unless your Hyper-V hosts are already processing a great many policies, I recommend creating a new policy. Right-click Group Policy Objects and select New.
- Give your new GPO a descriptive name and click OK. If you’ll be using different values for different host categories, then remember to use a unique name.
- Right-click on your new GPO and click Edit.
- Drill down to Computer Configuration, Preferences, Windows Settings, Registry. Right-click Registry, hover over New, then click Registry Item.
- Use these settings in the New Registry Properties dialog:
- Action: Update
- Hive: HKEY_LOCAL_MACHINE
- Key path: SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization
- Value name (Default unchecked): ShutdownTimeout
- Value type: REG_DWORD
- Value data: <some reasonable number of seconds >
- Base: Decimal
- Make sure that your new setting appears in the registry key list. Close the Group Policy Management Editor window to return to the main console.
- Under your domain, drill down to the OU that contains your Hyper-V host(s). Right-click on it and click Link an Existing GPO. If you want to use multiple settings, you’ll need to use separate GPOs or implement WMI Filtering.
- Select the GPO that you just created:
- Now, just wait for your next GPO push. Alternatively, log on to one or more hosts in the affected OU and run gpupdate /force in an elevated command/PowerShell prompt.
I looked high and low, and couldn’t find any documentation on this key. I don’t have the resources to do a great deal of testing either. I think it’s safe to say that this number represents the timeout in seconds. Hosts clearly do not wait two hours to shut down, and a 120 millisecond timeout would just be ridiculous.
What I can’t be sure of is whether or not this applies to each virtual machine individually or in aggregate. However, given how long modern physical systems need to reboot, I’m also having a hard time thinking of a good reason for such a short timeout. If you’ve got a VM that takes a long time to shut down (Exchange guests, we’re all looking at you), then feel free to kick this number up.
I don’t know if you need to do anything else (undocumented, remember?), but I chose to restart VMMS (Virtual Machine Management Service). When I did that, my regedit screen flickered as VMMS was starting back up. That makes sense, as many of the settings found in this branch belong to VMMS. That doesn’t mean anything for this particular key. So, I used procmon.exe to watch a VMMS service startup. It doesn’t care at all about that key. So, restarting VMMS doesn’t do anything in this case.
For testing, I set my timeout to 10 minutes (decimal 600) and it shut down fairly quickly. That means that setting a number higher than you need won’t needlessly extend the shut down process.
Unfortunately, I’ve never been negatively affected by this problem, which makes true testing difficult. So, I’m asking you, dear reader, to try this out and report back on your findings.
Problems This Hopefully Fixes
I have goals for this, and it also doubles as giving you some things to look for in your tests.
1) Service Shutdown
Primarily, I want to see if those systems that need extra time to shut down receive that extra time. By default, Windows Server will only wait 50 seconds for any given service to shutdown (HKLM\SYSTEM\CurrentControlSet\Control\WaitToKillServiceTimeout; expressed in milliseconds). Some applications might adjust this number upward. We do know that Windows applies this setting to each service individually. We don’t know how long it takes Windows to issue the shut down command to each service. I assume that all of them receive the shutdown command in very tight sequence. If we fudge it, then maybe a total of one minute for shutdown.
However, we’ve all seen services hang on shutdown for much longer than that. Some adjust that registry key because they know that they’ll need more time. I do know that the API allows a service to ask for more time when it’s told to shut down; I’m not sure if it can do that during system shut down as well, but I know I’ve seen services hang a host for a long time on shut down.
I also don’t know how Linux handles a long shut down cycle. I do have a couple of non-HA Ubuntu Server Linux VMs hosted on the local C: drive that periodically suffer catastrophic damage to their boot volume and need to be restored from backup (or, more probable, require a repair that I can’t figure out). I’m starting to suspect that they need a hair bit longer to shut down than they’re being given.
These are my primary interests with this setting.
2) Application Shutdown
We all know that servers should be logged out all the time. That’s mostly because every logged in user’s password sits in LSASS’s memory space in clear text. It’s also because logged in sessions use memory that could be put to better use. Unfortunately, we also have software vendors stuck in the 90s that don’t care about their customers’ security or resources and develop faux-service applications that run in Session 0 or a persistent desktop session. These are not subject to the service shutdown timeout.
When you manually shut down, any application that doesn’t respond to the shut down is given a timeout, and then a logged-in user is prompted to deal with it. For almost all shut down APIs, an automated shut down command can issue some variant of a “force” parameter that simply kills these problem children and continues the shut down.
In my testing, a host does not run down the shutdown timer waiting on these applications. It’s probable that the very short WaitToKillAppTimeout inside the guest takes precedence.
Let me know about your experiences.
3) Cumulative Effects
We have some numbers on individual timeouts, but the aggregates might change the equations. For example, if a guest’s service shut down is set to 50 seconds and the host’s shut down grace period is 120 seconds, that should work out, right? Well, I can think of one place where it wouldn’t. A guest takes 50 seconds to kill its services. Then, it starts applying four months’ worth of Windows Update patches. It’s got 70 seconds to get it done. Are you laughing? You’re laughing. Or grimacing. Either response is appropriate.
Basically, we’re trying to figure out how multiple VMs with multiple applications and multiple services and multiple OS shut down tasks figure within the host’s grace shut down periods. Aggregate data from multiple sources (that’s all of you) can help us to come up with some guidelines.
Let’s Hear It
We’ve now entered the audience participation section of this article. If this setting clears up your shut down problem, I most definitely want to hear from you. If it doesn’t fix your problem but seems to make some sort of difference, that’s good information as well. Whatever you can share with us, please do so.
Have any questions or feedback?
Leave a comment below!