Advanced Troubleshooting of Hyper-V Replica – Part 31
In the final part of this article series, we’ll tackle the remaining issues I’ve listed which may cause issues in Hyper-V replication.
Replication is paused at Replica Server or Primary Server
The Primary Server might not be able to replicate to the Replica Virtual Machine if the replication is paused at the Replica Server or an administrator has paused the replication deliberately. An event will be generated in the Event Viewer of Primary Server under Admin logs of VMMS as shown in the below screenshot:
As shown in the above screenshot, Event ID 32088 is generated which indicates that the replication is suspended on the Replica Server for the virtual machine “RVM1”. Please note the event message does not say that the replication is paused by an administrator or Hyper-V components. In any case, you must resume the replication to fix the issue.
No Adequate Network Bandwidth between Primary and Replica Server
Hyper-V Primary Server requires enough network bandwidth to replicate HRL files to the Replica Server. It actually depends on the size of data to be replicated over the network. Please make sure Primary and Replica Servers are connected using a high speed network connection. Since, I have planned to explain the network bandwidth required in a separate post, explaining the whole point about network bandwidth is out of scope of this article.
Your virtual machine might be replicating unnecessary data which might cause issues with the network bandwidth. For example, you would not want to replicate paging file content to Replica Virtual Machine. It is always a best practice to exclude paging file from Hyper-V replication. You can follow an article I wrote specifically to do so here.
Furthermore, virtual machine contents can be compressed before they are replicated. So it is advised to enable compression for the virtual machine.
Replication Enabled but Initial Replication is not initiated yet
Please note that “initial” replication for a virtual machine enabled for Hyper-V replication must be completed before Primary Server can start tracking the changes. The Primary Virtual Machine will be put in a “Warning” health status if the initial replication is not completed. There are three ways to replicate base virtual machine data as explained in the below article:
Not Enough Resources Available on Primary or Replica Server
Primary Server running out of system resources may result in replication failures. This would happen rarely and can only happen if there are many virtual machines replicating to the Replica Server and tracked changes are not replicated to the Replica Server in a timely manner.
Not enough storage on Primary Server or Replica Server to store HRL files
Replication might also stop working if the Primary Server does not have enough storage available on volume where HRL files are stored. Primary Server tracks changes on the Primary Virtual Machine and creates HRL (Hyper-V Replication Log) files. The HRL files are stored in the directory where virtual machine VHD files reside. These HRL files are replicated to the Replica Server. If the Primary Server cannot create the HRL files due to the lack of disk space then replication cycles will be missed which will cause Primary virtual machine to go into “Warning” health status and then into “Critical” health status once Primary Server has lost control over the tracked changes. Hence, you need to make sure Primary Server has enough space available on the volume where HRL files are stored.
Note: You cannot change the location of HRL files but you could move the virtual machine (including configuration and VHD Files) to a different storage where enough space is available.
Primary Server may also run out of space if you have enabled multiple recovery points for a virtual machine. You must thoroughly understand the impact of enabling multiple recovery points on a volume where large HRL files are stored.
Failover was done but Reverse Replication was not performed
In case of any issues with the Primary Virtual Machine, the “Planned Failover” is initiated to bring the virtualized workloads on the Replica Server. As part of this operation, the replication is stopped at the Primary Server. In other words, Primary Server stops tracking changes to the Primary Virtual Machine. Once the issue is fixed, replication must be reversed.
When the Failover is initiated from the Primary Server, the Primary virtual machine is put into “Prepared for Planned Failover” state as shown in the below screenshot:
An Event ID 32374 will also be generated on the Primary Server as shown in the below screenshot:
On Replica Server, the Replica virtual machine will be put into “Warning” state indicating that the reverse replication must be done in order to resume the replication for the virtual machine which is shown in below screenshot:
This is expected behavior of Hyper-V Replica and no action is required. However, you must reverse replicate the virtualized workloads to resume the replication. When you reverse the replication, the Replica Server performs the role of Primary Server and Primary Server performs the role of Replica Server. This is required if virtual machine will be running on the Replica Server and there are a lot of changes occurring on the virtual machine.
In the final part of this article series, we learned how Hyper-V Replication might fail if there is not enough space available on the Primary or Replica Server and how critical it is to have a high speed network connection between Primary and Replica Servers if the changes being replicated are large in size.
We also saw how to identify if replica virtual machine is paused and why it is necessary to reverse replicate the virtualized workload in order to continue the normal replication.
Have any questions or feedback?
Leave a comment below!