Save to My DOJO
Over the last several years, snapshots (or checkpoints as they are sometimes called) have become a popular alternative to backups. Checkpoints allow a system to be instantly rolled back to a previous state, without the hassles of managing tapes or waiting for restoration to complete. Despite all of its capabilities, however, checkpoints are not a true alternative to backups. In fact, checkpoints have several major shortcomings, and IT pros should seriously consider whether they are making the best use of checkpoints, but first, let’s quickly define the difference between checkpoints and snapshots.
What is the Difference Between a Checkpoint and a Snapshot?
The short answer is that there is no difference from a technical perspective. They are one and the same. The term snapshot was largely made popular with the VMware vSphere platform, which is the virtualization platform that many IT Pros cut their teeth on. Hyper-V followed the snapshot terminology for some time but eventually settled on the term checkpoint as an alternative. So, again, they are the same thing, but platform-dependent. VMware = Snapshot. Hyper-V = Checkpoint.
When Should I Use Snapshots?
As previously mentioned, checkpoints allow a system to be rolled back to a previous state almost instantly. The reason why a snapshot is able to do this is that, unlike a backup, a snapshot does not make a copy of your data.
This is not to say that checkpoints should never be used. Checkpoints do have their place. Checkpoints tend to work really well as a tool for protecting a virtual machine just prior to a configuration change. If a configuration change were to cause problems for a virtual machine, then the snapshot could be applied, effectively undoing the change.
Checkpoints are also useful to those who are performing software upgrades. If, for example, an operating system upgrade were to leave a virtual machine in an unbootable state, a snapshot can be applied, thereby reverting the virtual machine’s operating system to its pre-upgrade state. The same basic concept also applies to application upgrades and to the installation of patches.
The Anatomy of a Snapshot
In order to understand why checkpoints are not suitable replacements for backups, it is necessary to understand how checkpoints work. There are a few different types of checkpoints, but for the purposes of this discussion, I will talk about the way that checkpoints work in Microsoft’s Hyper-V.
The vast majority of Hyper-V virtual machines make use of one or more virtual hard disks. A virtual hard disk is simply a VHD or VHDX file that acts as a hard disk for a virtual machine. Like a physical hard disk, a virtual hard disk file can include volumes, file systems, and of course, files. Under normal circumstances, the virtual hard disk file is read/write, meaning that the virtual machine can write data to and read data from the virtual hard disk. While this probably seems obvious, it is actually an important point.
When an administrator creates a checkpoint for a Hyper-V virtual machine, it does not make a backup copy of the virtual hard disk file. Instead, Hyper-V puts the virtual hard disk into a read-only state. Because the virtual hard disk is now read-only, Hyper-V creates a differencing disk that becomes a part of the virtual machine. This differencing disk is essentially just a virtual hard disk file that has a parent/child relationship with the virtual machine’s original virtual hard disk file. A layout of this configuration can be seen below.
Because the virtual machine’s original virtual hard disk is now read-only, all write operations are directed to the differencing disk. This ensures that the original virtual hard disk file’s contents remain unchanged.
Now, suppose that an administrator created a checkpoint for a Hyper-V virtual machine and then attempted to upgrade an application that was running on the VM. Let’s also pretend that the application upgrade process has failed and that the virtual machine has been left in an undesirable state. The administrator can easily put things back the way that they were by applying the checkpoint.
When the administrator applies the checkpoint, Hyper-V deletes the differencing disk and resumes read/write operations on the original virtual hard disk (there are actually a few different options for how checkpoints are applied, but this is the simplest use case). At this point, the virtual machine is back to normal.
Why Checkpoints Are Not Backup Replacements
The most fundamental reason why checkpoints are not effective backup replacements is that the checkpointing process does not create a copy of the virtual hard disk. As such, checkpoints do nothing to protect against physical disk failure or against damage to the virtual hard disk file. If a virtual machine’s virtual hard disk were to be damaged or destroyed, the virtual machine’s checkpoints would be useless because they have a dependency upon the virtual hard disk file.
Another thing to consider is that the differencing disks used by the checkpoint process are usually stored on the same physical volume as the virtual hard disk. Hence, if the volume were to be damaged, there is a good chance that both the virtual hard disk and the differencing disks would be lost.
Another key reason why checkpoints are a poor substitute for backups is that they do not allow for single item recovery. A checkpoint can be used to roll back an entire virtual machine, but it cannot be used to roll back an individual file or application on that virtual machine.
This brings up another important point. Checkpoints can cause significant problems for application servers. Early versions of Hyper-V were known for causing data corruption problems when checkpoints were applied to virtualized application servers. Microsoft eventually fixed this problem through the introduction of production checkpoints. Even so, production checkpoints do not address all of the potential consistency problems that checkpoints can cause.
Most application servers have dependencies on other servers. An application might, for example, be tied to a SQL Server, a Web front end, or perhaps an LDAP server. If a checkpoint were to be used to roll back an application server, there is a chance that a consistency problem would occur because other dependency servers were not rolled back. Of course, the chances of this happening vary depending on the application and the role that the virtual machine is performing, but application consistency should always be a consideration when using checkpoints.
Checkpoints Can Hurt Virtual Machine Performance
One of the biggest reasons why you should use checkpoints sparingly is because checkpoints can significantly degrade the performance of your virtual machines. The performance impact probably won’t be all that noticeable at first, but as additional checkpoints are created, the virtual machine’s performance tends to drop off sharply.
The reason why checkpoints can hurt performance has to do with the way that checkpoints work. As previously noted, checkpoints redirect write operations to a differencing disk. So with that in mind, consider what happens when a virtual machine performs a read operation.
When a read occurs, the VM attempts to read data from the differencing disk (remember, the differencing disk contains the most recent data). If the VM does not find what it is looking for on the differencing disk, then it attempts to read the data from the original virtual hard disk.
Now, suppose for a moment that an administrator creates an additional checkpoint for a virtual machine. Hyper-V will then treat the existing differencing disk as read-only and will create a new differencing disk. All future write operations will now be directed to this differencing disk. Here is a brief diagram showing this in action:
Now, when the virtual machine attempts a read operation, it will first look for the data on the most recently created differencing disk. If that differencing disk does not contain the requested data, then the virtual machine will look to the previously created differencing disk. If the data cannot be found there, then Hyper-V will attempt to read the data from the original virtual hard disk. Hence, each checkpoint that is created has the potential to further degrade read performance for the virtual machine. This ultimately ends up providing diminishing returns.
All said, checkpoints and snapshots are great for their intended use-case (Not backups!), but you have to keep them properly managed and maintained.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!