The title of this article describes the symptoms fairly well. You Live Migrate a virtual machine that’s backed by SMB storage, and the permissions shift in a way that prevents the virtual machine from being used. You’d have to be fairly sharp-eyed to notice before it causes problems, though. I didn’t catch on until virtual machines started failing because the hosts didn’t have sufficient permissions to start them. I don’t have a true fix, meaning that I can’t prevent the permissions from changing. However, I can show you how to eliminate the problem.
The root problem also affects local and Cluster Shared Volume locations, although the default permissions generally prevent blocking problems from manifesting.
I have experienced the problem on both 2012 R2 and 2016. The Hyper-V host causes the problem, so the operating system running on the SMB system doesn’t matter.
Symptom of Broken NTFS Permissions for Hyper-V
I discovered the problem when one of my nodes went down for maintenance and all of its virtual machines crashed. It only affected my test cluster, which I don’t keep a close eye on. That means that I can’t tell you when this became a problem. I do know that this behavior is fairly new (sometime in late 2016 or 1Q/2Q 2017).
Symptom 1: Cluster event logs will fill up with the generic access denied (0x80070005) message.
For example, Hyper-V-VMMS; Event ID 20100:
The Virtual Machine Management Service failed to register the configuration for the virtual machine '04C7BE1C-ECAC-4947-9D7D-775E28F3B76E' at 'svstorevms': General access denied error (0x80070005). If the virtual machine is managed by a failover cluster, ensure that the file is located at a path that is accessible to other nodes of the cluster.
Hyper-V-High-Availability; Event ID 21502:
Live migration of 'Virtual Machine svmanage' failed.
Virtual machine migration operation for 'svmanage' failed at migration destination 'svhv2'. (Virtual machine ID 974174B7-A3F2-471C-91C2-5081832ACB5A)
User 'NT AUTHORITYSYSTEM' failed to create external configuration store at 'svstorevms': General access denied error. (0x80070005)
You will also have several of the more generic FailoverClustering IDs 1069, 1205, and 1254 and Hyper-V-High-Availability IDs 21102 and 21111 as the cluster service desperately tries to sort out the problem.
Symptom 2: Virtual machines disappear from Hyper-V Manager on all nodes while still appearing in Failover Cluster Manager.
Because the cluster can’t register the virtual machine ID on the target Hyper-V host, you won’t see it in Hyper-V Manager. The cluster still knows about it though. Remember that, even if they’re named the same, the objects that you see as Roles in Failover Cluster Manager are different objects than what you see in Hyper-V Manager. Don’t panic! As long as the cluster still knows about the objects, it can still attempt to register them once you’ve addressed the underlying problem.
I’m guessing that “helper” behavior gone awry has caused unintentional problems. When you Live Migrate a virtual machine, Hyper-V tries to “fix” permissions, even when they’re not broken. It adjusts the NTFS permissions for the host.
The GUI ACL looks like this:
The permission level that I set, and that I counsel everyone to set, is Full Control. As you can see, it’s been reduced. We click Advanced as the first investigative step and see:
The Access still only tells us Special, but we can see that inheritance did not cause this. Whatever changes the permissions is making the changes directly on this folder. This is the same folder that’s shared via SMB. Double-clicking the entry and then clicking the Show advanced permissions link at the right shows us the new permission set:
When I first found the permissions in this condition, I thought, “Huh, I wonder why/when I did that?” Then I set Full Control again. After the very next Live Migration, these permissions were back! Once I discovered that behavior, I tested other Live Migration types, such as using Cluster Shared Volumes. It does occur on those as well. However, the default permissions on CSVs have other entries that ensure that this particular issue does not prevent virtual machines from functioning. VMs on SMB shares don’t automatically have that kind of luck — but they can benefit from a similar configuration.
Permanently Correcting Live Migration NTFS Permission Problems
I don’t know why Hyper-V selects these particular permissions. I don’t know precisely which of those unchecked boxes cause these problems.
I do know how to prevent the problem from adversely affecting your virtual machines. In fact, even in the absence of the problem, I would label this as a “best practice” because it reduces overall administrative effort.
- In Active Directory (I’ll use Active Directory Users and Computers; you could also use PowerShell), create a new security group. For my test environment, I call mine “Hyper-V Hosts”. In a larger domain, you’ll likely want more granular groups.
- Select all of the Hyper-V hosts that you want in that new group. Right-click them and click Add to group.
- In the Select Groups dialog, enter or browse to the group that you just created. Click OK to add them.
- Restart the Workstation service on each of the Hyper-V hosts.
- On the target SMB system, add the new group to the ACL of the folder at the root of the share. I personally recommend that you change both SMB and NTFS permissions, although the problem only manifests on NTFS. Grant the group Full Control.
You will now be able to Live Migrate and start virtual machines from this SMB share. If your virtual machines disappeared from Hyper-V Manager, use Failover Cluster Manager to start and/or Live Migrate them. It will take care of any missing registrations.
Why Does this Work?
Through group permissions, the same object can effectively appear multiple times in a single NTFS ACL (access control list). When that happens, NTFS grants the least restrictive set of permissions. So, while the SVHV1’s specific ACE (access control entry) excludes Write attributes, the Hyper-V Hosts group’s ACE includes it. When NTFS accumulates all possible permissions that could apply to SVHV1, it will find an Allow entry for the Write attributes property (and others not set on ACE specific to SVHV1). If it found a Deny anywhere, that would override any conflicting Allow. However, there are no Deny settings, so that single Allow wins.
Do remember that when a computer accesses an NTFS folder through an SMB share, the permissions on that share must be at least as permissive as NTFS in order for access to work as expected. So, if the SMB permission only allows Read, then it won’t matter that the NTFS allows Full Control. When NTFS permissions and SMB permissions must be evaluated together, the most restrictive cumulative effect applies. I’m mostly telling you this for completeness; Hyper-V will not modify SMB permissions. If they worked before, they’ll continue to work. However, I do recommend that you add the same group with Full Control permissions to the share.
As I mentioned before, I recommend that you adopt the group membership tactic whether you need it or not. When you commission new Hyper-V hosts, you’ll only need to add them to the appropriate groups for SMB access to work automatically. When you decommission servers, you won’t need to go around cleaning up broken SID ACEs.