Hyper-V Differencing Disks for SQL Server Database Copies

Have you spent too many hours providing your development group(s) with a copy of your production databases? Do you face challenges managing a complex process to get those copies up and running? Today I will cover a method of copying your production data in your development environment in a quick and space efficient manner using Hyper-V parent and differencing disks.

Here’s the process at a high level, detailed instructions are provided later in the post:

Create the parent disk containing the databases

The goal here is to create a parent virtual hard disk that will contain the main copy of your database from which one or many differencing virtual hard disks will derive from in a parent/child relationship type.

  1. Create a VHDX disk in Hyper-V that will act as the parent disk and attach it to a VM running SQL Server. I would recommend that you use a non-production/lab VM to complete the overall parent disk preparation.
  2. Restore the desired database from your production data using the backup solution of your choice; Altaro VM Backup would be a good choice and SQL Server native backup/restore will work fine as well. It’s important to note that the target location of the restore should be in the path of the attached empty parent disk created in step 1. You wouldn’t want to overwrite an active database in the process!
  3. Depending on your restore method, you might need to attach your database in SQL Server by pointing to the files restored in step 2. Some restore processes will do that automatically for you.
  4. Once your database copy on your new parent disk is online and available, perform any operations in it that would be applicable to all your development or test environments
  5. Detach the database copy residing on your parent disk from your SQL Server instance
  6. Detach the parent disk containing a copy of your database that is ready to be linked to differencing disks

It’s important to note that from this point on, your parent disk should not be modified. Any changes to the parent disk will permanently break all the differencing disks depending on it. If you haven’t linked any differencing disk to it yet and you forgot to make a change to your parent, it’s still safe to do so in that particular state only.

Create the databases clones

Now that we have our parent disk ready to use, we need to create a differencing virtual hard disk that will be linked to it.

  1. While creating a new differencing disk in Hyper-V Manager or PowerShell, make sure you specify the parent created after the step 6 in the process above. Once the disk is created, you can attach it to the desired VM that is running your development or test environment
  2. Attach the databases contained in the differencing disk (child) to the desired SQL Server instance running in your VM
  3. Perform any operations in your cloned databases that would be applicable only to that particular environment
  4. Profit!

As you can see, the process is not exactly rocket science. The magic comes from using this method in an automated manner. PowerShell is really the system administrator’s best friend once again!

Before we dive in the details of how that works, let’s cover some of the pros/cons of using this method:

Pros

  • You only have to keep one copy of the production data independently of how many development/test copies you need
  • Creating the clones will take a few minutes at the most, independently of your database size
    • You pay the time penalty of the restore only once
  • The created differencing disk will only require a few hundred megabytes in storage initially
    • This will change based on the write activity in the databases afterwards
  • If you need to apply a certain set of scripts on top of your production data following the restore, you can do it on:
    • The parent disk if those changes are applicable to all your development environment
      • You can perform physical and logical defragmentation of your indices, which will also benefit all child environments
      • Lengthy operations will only need to be performed once per parent disk, which is a great time saver in your database refresh process
      • The IO burden on your storage solution will be reduced as you will be performing operations only once
    • The cloned copy that is specific to a particular environment
  • You can include multiple databases within a single parent disk to maintain data coherency between multiple systems
  • You can create a new parent from an existing parent. Here’s an example
    • Create a first parent that contains a copy of your production database as is
    • Create a new differencing disk that is referencing the above parent disk
    • Run your database schema upgrade script for your current development stream
    • Use this differencing disk as a parent for other child disk for other environments that require that specific database schema version
    • Using this method, you can have a tree of parent/differencing disks all deriving from your original database copy
  • The process is storage agnostic, sorry storage array vendors!
  • If your storage solution can pin specific files or blocks to a faster tier  (i.e. SSD) either manually or automatically, you can leverage this with your parent or child disks to speed up your environments in a cost effective manner without getting into deduplication

Cons

  • If you expect a lot of data churn in that particular environment, for instance, massive data loads or deletion, you can potentially lose some of the space saving benefits
    For example, I have seen large tables in business intelligence databases being truncated and reloaded on a daily basis. When that happens, the differencing disk in Hyper-V will grow to keep track of the delta between the parent and child disk. In that particular case, I saw the differencing disk becoming almost as large as the parent.
  • How you create your development copies of your database needs to be fairly standardized and scripted
    I plead guilty to forgetting that stored procedure permission for that obscure application user on more than one occasion. The fix, script it once and for all and live free! If you end up doing a lot of manual intervention or tweaks following the restore of your production data, you lose some of the efficiency gains of the automated process and the confidence of a repeatable process.
  • Doesn’t work for SQL Server Always On database copies
    The database needs to be mounted in restore mode in SQL Server for this to work. Since the database in the parent disk has been fully restored it’s not eligible to join an Always On database availability group.

The Detailed Process

In the following example, we will use a server named SRV10010 to act as both the server to create the parent disk and the server that needs to have a differencing disk.

Creating the Parent Disk

1) On the Hyper-V host, create a VHDX disk in Hyper-V that will act as the parent disk and attach it to a VM running SQL Server

New-VHD -Path Parent_SQL_GizmoSales_2016-01-25.vhdx -SizeBytes 10GB -Dynamic 
Add-VMHardDiskDrive -VMName SRV10010 -Path .Parent_SQL_GizmoSales_2016-01-25.vhdx -ControllerType SCSI

A couple of recommendations:

  • Use a standard naming convention for your parent disks. It will make them more discoverable in your environment.
  • Include a versioning notion in your parent disk naming. For instance, we typically use the date they were created to give a reference point of how fresh that parent is.

2) In the VM, mount and create a partition and volume on which your parent database will reside. Typically you will see the newly attached disk as Offline in Disk Management. Take a note of that disk number. For instance, if your newly created disk is seen as Disk 2 in the Disk Management console of your VM, you can run the following PowerShell commands:

Get-Disk -Number 2 | `
Initialize-Disk -PassThru | `
New-Partition -UseMaximumSize | `
Add-PartitionAccessPath -AccessPath "c:sqldataGizmoSales_2016-01-25" -PassThru | `
Format-Volume -NewFileSystemLabel GizmoSales_2016-01-25 -Confirm:$false -FileSystem NTFS -AllocationUnitSize 64KB

Doing the automation of properly identifying the newly attached disk involves a few more steps that won’t be covered in this article. It’s definitely something we will cover in the near future!

3) In the VM, restore the desired databases from your production data using the backup solution of your choice; Altaro VM Backup would be a good choice. SQL Server native backup/restore will work fine as well.

4) In the VM, perform any operations on your master databases that would be applicable to all your development or test environments

For instance, you could run the following script to update your statistics using either SQL Server Management Studio or sqlcmd:

USE GizmoSales 
GO
EXEC sp_updatestats 
GO

5) In the VM, detach the databases from your SQL Server instance using either SQL Server Management Studio or sqlcmd:

EXEC sp_detach_db @dbname = N'GizmoSales';

6) In the VM, cleanly unmount the volume by removing the access path to the disk and taking the disk offline. Please make sure that you use the proper disk number from your environment, Disk 2 in the example below could take down a database for someone in an unintended way!

Get-Partition -DiskNumber 2 -PartitionNumber 2 | Remove-PartitionAccessPath -AccessPath "c:sqldataGizmoSales_2016-01-25"
Remove-Item C:sqldataGizmoSales_2016-01-25
Set-Disk -Number 2 -IsOffline:$true

7) On the Hyper-V host, detach the parent disk from the VM:

Get-VM -Name SRV10010 | Get-VMHardDiskDrive |Where-Object {$_.Path -eq "H:Hyper-Vvm_disksParent_SQL_GizmoSales_2016-01-25.vhdx"} | Remove-VMHardDiskDrive

Creating the Differencing Disk

1) On the Hyper-V host, create a differencing disk in Hyper-V and attach it to the desired VM that is running your development or test environment

New-VHD -ParentPath Parent_SQL_GizmoSales_2016-01-25.vhdx -Path SRV10010_DAT_GizmoSales_DEV.vhdx 
Add-VMHardDiskDrive -VMName SRV10010 -Path .SRV10010_DAT_GizmoSales_DEV.vhdx -ControllerType SCSI

2) In the VM, put the disk online, mount the volume in the VM and assign either an access path to the volume or a drive letter

Set-Disk -Number 2 -IsOffline:$false
New-Item -Path c:sqlDataGizmoSales_DEV -Type Directory
Get-Partition -DiskNumber 2 -PartitionNumber 2 | Add-PartitionAccessPath -AccessPath "c:sqlDataGizmoSales_DEV"

3) In the VM, rename the data files from the parent to match the name of your target environment:

Rename-Item C:SQLDataGizmoSales_DEVGizmoSales_2016-01-25.mdf C:SQLDataGizmoSales_DEVGizmoSales_DEV.mdf
Rename-Item C:SQLDataGizmoSales_DEVGizmoSales_2016-01-25_log.ldf C:SQLDataGizmoSales_DEVGizmoSales_DEV_log.ldf

4) In the VM, attach the databases contained in the differencing disk to the desired SQL Server instance. You can use the GUI or a script like the following in SQL Server Management Studio or sqlcmd:

CREATE DATABASE GizmoSales_DEV 
ON (FILENAME = 'C:SQLDataGizmoSales_DEVGizmoSales_DEV_Data.mdf'), 
(FILENAME = 'C:SQLLogsGizmoSales_DEVGizmoSales_DEV_Log.ldf') 
FOR ATTACH;

5) You can now perform any operations in your cloned databases that would be applicable only to that particular environment. For instance, run a T-SQL script to upgrade the schema of the database to a newer version.

6) Profit!

The size of the parent VHDX vs the differencing disk

The size of the parent VHDX vs the differencing disk

As you can see in the image above, even with a dummy database that doesn’t contain any data, you already save some storage!

Now that you know how to do it by hand, there’s a few scripts you can use to inspire yourself on the GEM Automation project under the VHDXSQLRefresh section on CodePlex. At this point in time, it does not support the full automation process using Altaro but it can give you a hint as to how you can get this going in your environment. It may be the topic of a future post if people are interested.

Should you have any questions regarding this post, feel free to comment below, I’ll be glad to help you out!

 

Threat Monitor
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment or ask a question

Your email address will not be published. Required fields are marked *

Your email address will not be published. Required fields are marked *

Notify me of follow-up replies via email

Yes, I would like to receive new blog posts by email

What is the color of grass?

Please note: If you’re not already a member on the Dojo Forums you will create a new account and receive an activation email.