Save to My DOJO
Making sure you have verified backups of your data and VMs in Azure is critical. But backup is more than just copying data, it’s part of a wider Disaster Recovery (DR) preparedness and as Azure becomes a platform for your business, your DR plan needs to be solid. In this article, we’ll look at how this can be best achieved, how to handle business-critical workloads, and the best way to use a new feature, Cross Region Restore (CRR).
A quick note – all Azure regions are made up of one or more datacenters, each datacenter has separate power, cooling and networking infrastructure. Each region also has a paired region in the same country / geographical area, ensuring that you can comply with data residency requirements whilst also providing optional replication in case of a region outage.
Would you believe that in the early days of IaaS VMs becoming available in Azure, there was no platform backup system on offer? The recommendation was to run System Center Data Protection Manager (DPM) in a VM and back up your other VMs to it.
Times have definitely changed and Azure Backup is now a very capable enterprise data protection solution that safeguards much more than your Azure VMs. In fact, you can use Azure Backup to protect Linux and Windows Azure VMs, SQL server and SAP HANA VMs, Azure File shares and on-premises VMs using either the Microsoft Azure Recovery Services (MARS) agent or the Microsoft Azure Backup Server (MABS) option.
Production VMs protected in Azure Backup
Let’s start with Azure VMs, the first step is creating a vault to store backups. Each vault can hold up to 1000 VMs (a total of 2000 data sources) and you can back up each Azure VM once a day. Each region needs its own vault (if you have deployments globally) and VMs can only be backed up to a vault in its region. Each disk can be up to 32 TB in size and in total the disks for a VM can be up to 256 TB. Windows VM backups are application-aware, whereas Linux VMs are file consistent, unless you use custom scripts.
The first choice is which underlying type of Azure storage you’re going to use because once you’ve started protection this can’t be changed. You can pick from Locally Redundant Storage (LRS), three copies of your data in a single region, or Geo Redundant Storage (GRS), three copies in the local region and three additional copies in the paired region. Currently only UK South and South East Asia support the third option, Zone Redundant Storage (ZRS) for backup which spreads copies of your data across different datacenters in the same region. The default and recommended option is GRS.
Once you’ve created the vault, simply define one or more policies that specify when to backup and how long to keep the backups for. For SQL Server (in a VM) you can define log backups up to every 15 minutes.
SQL Server backup policy
When the time comes to restore (which is the point really, nobody wants backup for its own sake, what you want is the successful recovery of the VM or the data) you have several options. If you need to restore individual files, a recovery point (by default the latest one) will be mounted a local drive through a script that you download, allowing you to browse the file system and grab the files you need, as you can see here:
Script mounting drives for file recovery
When it comes time to restore a corrupted VM (or just testing your DR plan – something that you should do regularly) you can create a new VM, specifying the Resource Group (RG), virtual network (VNet) and storage account. This new VM must be created in the same region as the source (but see CRR below). You can also just restore a VMs disk(s), which will give you a template as well that you can customize and create a new VM based on the restored disks. A third option is to replace an existing VM, while the fourth option is CRR.
Backup jobs reporting in Azure Backup
If you have VMs on premises that you’d like to back up to the cloud you have three options, the MARS agent that lets you backup any Windows server, anywhere, to Azure. If you have a handful of servers this is definitely an easy option (essentially replacing Windows Server Backup with a similar tool, that includes support for Azure as a destination). MARS supports files, folders and system state and backs up twice a day but if you have more than a few servers, MABS is a better option.
MABS is a “free” version of System Center Data Protection Manager (DPM), which doesn’t support backing up to tape, nor protecting one DPM server with another. With MABS you don’t pay for the license of the server itself, instead, you pay for each protected instance. The beauty of MABS is that you first protect workloads on premises to local disk (as often as every 15 minutes if you need it) and it then synchronizes recovery points to Azure up to three times a day. This makes most recoveries much faster as data doesn’t have to be downloaded from Azure. The third option is to use DPM, with the addition of Azure as a secondary backup storage location (replacing tapes).
Note that restore operations from the cloud to on-premises are free. You don’t pay the normal data egress charges as data is downloaded.
Azure Site Recovery
Backup is essential and it’s what you need when everything else has failed. But recovering from a large-scale outage, either in the Azure platform or due to an attack such as ransomware by just restoring backups is a time-consuming proposal. There are business-critical workloads that require more than a mere backup, a full DR plan is required. This can be in the form of High Availability by spreading workloads across Availability Zones, using a load balancer to provide multi-server redundancy, distributing the data to multiple regions using Cosmos DB or putting Front Door in front of a global web application. Here, we’re going to look at Site Recovery (ASR). Symon looked at ASR in the context of on-premises, geo-distributed Hyper-V clusters in this blogpost.
Where Azure Backup is “copy your VM / data to a separate storage location on a regular cadence”, ASR is “replicating VM (and physical servers) disk changes on a continuous basis to a separate host” for very fast recovery. They’re not mutually exclusive, having tamper-proof historical backup recovery points is going to save your behind when ransomware strike or a super important document folder was deleted two weeks ago. But replication is what’s going to make you the hero when a region in Azure is down and you can bring up the replicated VMs in the paired region in minutes. Be aware that replication is continuous (with recovery points kept 24 hours by default and app consistent snapshots generated every 4 hours by default) so that if a file server VM is infected with ransomware, ASR will dutifully replicate the encrypted files to the target region almost instantaneously. This is why Azure Backup and ASR need to be used together.
ASR can replicate on-premises Hyper-V, VMware VMs or physical servers for DR and provides recovery plans to orchestrate complex applications (for example: bring up the DCs first, then the database servers, stop for a manual step to run a script to check database consistency, then start the front end servers), along with many other features. The other way to use ASR is to replicate from one Azure region to another.
You can group up to 16 VMs together into replication groups so that all VMs that make up an application also share application and crash-consistent recovery points. You can also use recovery plans, including adding automation runbooks to ensure that your VMs are started in the right order and recovery tasks are automated.
Whether for on-premises to Azure, or Azure to Azure DR, you don’t pay for VMs in the target location, just a per VM cost, plus storage costs. Only when you do a test failover or a real failover, which creates VMs do you pay VM running costs. And the first 31 days of each replicated VM is free.
Cross Region Restore
If you’re using Azure Backup to protect VMs in one region and you’ve configured the vault(s) to use GRS, you might assume that you could restore them in the secondary region at will. Not so, unless Microsoft declares a disaster in your primary region. Cross Region Restore (CRR), currently in preview, changes this dynamic and lets you decide when to restore a VM in the secondary, paired region, perhaps for testing purposes or because something’s happening to your resources in the primary region, but the problem isn’t large enough for Azure to declare an outage.
If you already have a Recovery Services vault that’s using GRS, you can enable CRR under Properties. This action cannot be undone, so you can’t turn a CRR enabled vault back to a GRS vault. Note that if you have a vault that’s using LRS and already has protected data in it you’ll need to perform some workarounds.
Enable Cross Region Restore for a vault
Currently, CRR supports Azure VMs (with disks smaller than 4 TB), SQL databases hosted on Azure VMs and SAP HANA databases in Azure VMs. Encrypted VM disks are supported for restore, including the built-in Storage Service Encryption (SSE) as well as Azure Disk Encryption (ADE).
CRR is based on customer feedback and it makes a lot of sense for Microsoft to provide more control for customers as to when, where and how they restore their workloads. There could be regulatory or audit reasons to test restores and CRR also obviates any waiting time for Microsoft to declare a disaster for the primary region.
Remember, just because it’s in the cloud doesn’t mean you can forget about backup and DR, your VMs are still your responsibility.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!