Save to My DOJO
Some argue that data has become the world’s most valuable commodity, worth even more to businesses than gold. Unfortunately, it is also much easier to lose all of your data than a stockpile of precious metals. As the importance of data increases, so does the need to protect it using backup software and recover it if a disaster strikes.
Conceptually, a backup is fairly straightforward – it involves making a copy of a file, database, or computer and saving it as a backup file. Usually, the initial backup will take longer since all the data must be captured. Still, subsequent backups can be much quicker as usually only the changes since the last backup are saved, which is used with an incremental or differential backup (these are discussed later). Recovery allows you to restore that backup file to the original or new computer with the same information or state as when it was backed up.
When planning your backup strategy, there may be dozens or even hundreds of variables that you will want to consider. Some organizations plan their backups based on the location of their datacenter(s), others will make the decision based on their existing storage providers, some look at the feature set or cost of the backup software. In contrast, others will make the decision based on how quickly they can recover after a disaster to minimize their downtime. There is no right strategy, so ultimately, it comes down to the organization’s priorities and budget.
This article introduces computer data backup for enterprises by explaining how it works and reviewing different planning considerations.
Components which can be Backed Up in the Enterprise
Here are the most common components that enterprises should consider protecting, starting from smallest to largest.
- Files – Files are generally the smallest unit of backup, although individual blocks of the file can also be protected. Operating systems like Windows or Windows Server will usually give you the option to automatically backup all of your files to a local disk or to a cloud service like Microsoft OneDrive. Sometimes you can even restore files to different points in time by picking which backup you want to recover from. You may want to create custom backup policies for specific files if they are important; otherwise, they will usually get backed up when the entire disk or computer is backed up. In order to save space, it is usually possible to backup and recover specific items within the application, such as a single mailbox or message from an Exchange Server.
- Databases – Databases usually have customizable backup policies since they can often contain business-critical data. Most databases will have built-in backup tools, and third-party providers will usually support the most common databases like SQL Server. A computer’s registry is also an internal database, but this will usually get protected when the operating system is backed up.
- Disks / Drives – Most enterprises will backup their applications at the disk or drive level as their smallest unit to simplify management. Virtually all backup ISVs will support disk-level backup and recovery while supporting common storage optimizations like RAID, mirroring, and deduplication.
- Applications – Most applications will allow you to separately backup the application’s settings and its data. The former may include items like the user’s preferences or the application’s IP address. Generally, the application’s data is the most important, and this may have its native backup process or leverage an industry-wide solution to allow third-party solutions to perform a backup, like Microsoft Volume Shadow Copy Service (VSS). Check out this blog for more information about how application-consistent backups work using Microsoft VSS.
- Operating Systems – It is possible to backup an operating system. During this process, the configuration data and registry settings are being protected, and independently all the users’ files are being backed up. Since these usually result in separate backup files, it allows them to be individually restored. By restore settings, it effectively allows their user preferences to be copied to a different device.
- Virtual Machines – A Virtual Machine (VM) usually consists of a virtual hard disk (VHD), which is a file hosting the VM’s OS, a second VHD that hosts the data for the application running inside the VM, and a configuration file which defines the settings for that VM. Each of these file types can be backed up and restored independently. Hyper-V, VMware, and Linux virtualization all operate slightly differently. Whichever one you choose, make sure the chosen backup solution supports your hypervisor. You should also check that your various guest operating systems running within the environment can also be protected.
- Mobile Devices –Backing up settings or data from a mobile device is usually done by the telecom carrier or the device manufacturer. If it is a device managed by the enterprise, then the security settings and user data can usually be backed up by IT. Some backup ISVs will support popular mobile operating systems, but many do not. If this is a requirement, make sure that you check that provider’s support matrix to ensure that your organization’s devices and their specific operating systems can be protected.
- PCs –Most standard computers will use the built-in backup solution provided by their OEM, such as Windows Backup. Similar to other systems, PCs will separately protect the computer’s configuration data from the registry from its users’ files. At the enterprise level, organizations can centrally control policies to automatically protect all the PCs in their environment and centrally store their data. Enterprise backup vendors may not support client operating systems, so always verify with your preferred ISV if this needs to be part of your protection plan.
- Servers –A server is protected just like a PC, where its configuration data and files are independently protected. Servers are usually centrally managed by the enterprise, giving them control over critical policies to ensure security and compliance. Most backup ISVs will support the common server operating system, like Windows Server and VMware ESXi, and may have varying support for different Linux distributions.
- Clusters – A cluster is a collection of servers that are often running VMs. While the servers and VMs will be protected by the backup provider, the cluster configuration information is the component needing to be protected at the cluster level. Deploying and optimizing clusters can be complicated, so having a backup solution that can restore these settings is often desired by enterprises. Always make sure that your backup ISV is cluster-aware. For more information, check out Altaro’s blog on protecting Windows Server Failover Clusters.
- Network Devices –Configuring physical and virtual network devices is complicated and once operational, it is important to retain their settings, security policies, and routing tables. Generally, networking devices will be able to output their configuration data to a standard file format (such as XML), which can then be protected by standard backup solutions. If recovery is needed, it can import that same XML file. Make sure that you include any critical network devices in your planning.
- Datacenters – In the event that you lose an entire datacenter to a disaster, you will want to be able to recover and restart your critical services in a second site or in a public cloud. This is generally referred to as disaster recovery. When protecting an entire datacenter, you are essentially protecting all of its components as a single logic group, including the disks, VMs, servers, and network devices. If you are seeking a datacenter-wide protection solution, make sure that every component is protected by your backup ISV, and you have a disaster recovery plan which will let you restart your critical services in an alternative site.
As you can see, it is possible to protect almost every datacenter component. Next, you will want to consider where you are storing those backup files.
Storage Options for Enterprise Computer Backup Data
Storage will be a key factor in defining your backup strategy, and it may also impact which backup ISV you select. When you plan your backup strategy, you want to also think about the reasons that may lead to you needing to recover data. Is it because you are working with unsophisticated users, are you a target for hackers, or could a natural disaster destroy your datacenter? For this reason, you want to consider local, shared, and offsite storage options. The amount of data you are protecting (capacity) may also influence your decision as there is usually a tradeoff between ease-of-use and price. Finally, you will want to consider how quickly you need to recover from a disaster, and the storage speed can impact this.
- Local Storage – When a backup is created, it creates a backup that is usually as large as the data set, it is protecting. With this in mind, ff you are running backups within an OS at the file level, this will generally be smaller than the file size of the OS because you are just copying the settings and user files, but not the operating system itself. However, if you are protecting a whole drive, it will generally be the size of all the files since it needs to take full copies of the data. There are techniques to shrink this backup file, such as compression and deduplication, covered later in this article. If you want to backup a component to the PC or server it is running on, be aware of the file size to ensure that you have enough capacity for the initial backup while giving it room to grow. While local backups usually have a very fast backup and recovery time, if you lose your laptop, then you may also lose your backup.
- Shared Storage– Most enterprises will backup all of their components onto centralized storage. This helps them because if any PC, mobile phone, or server crashes, gets stolen, or is destroyed, its data can be easily recovered to a replacement device. Centralized management usually provides operational efficiencies through shared policies and standards, and the storage may be cheaper than hosting the backup files on local devices. The main disadvantage is that all of the computers must regularly connect to the backup system to offload their backup files. This can create significant network traffic, so much so that some organizations create dedicated networks that handle all of the traffic for tasks like backup, deployment, and patching. The other challenge is that this centralized storage could be a single point of failure if a fire burns down the datacenter, so most enterprises will also look for an offsite solution.
- Offsite Storage – Having site-wide resiliency is an important part of business continuity planning. You should be able to copy your backup files from local storage or shared storage directly to an offsite location. Most organization will use the shared storage as an intermediary device so that backup and recovery is quicker. When sending backups offsite, the data files must be encrypted as they travel through longer networks or the public Internet. Offsite storage could be a secondary datacenter managed by the enterprise, a partner or service provider’s cohosted datacenter, or even a public cloud service. Microsoft Azure is a popular backup solution for many enterprises as it has a broad set of supported data, applications, and virtual machines and offers backup as a service (BaaS). Many backup providers now offer backup to the cloud, such as Altaro’s VM Backup. Whenever you are using an offsite backup solution, keep in mind that you will still need to pay for the cloud storage capacity you are using and possibly the network bandwidth, which the backup traffic is consuming.
Once you have defined your site requirements, then you can consider additional backup features.
Storage Media for Enterprise Computer Backup Data
This section will review some of the storage media options that enterprises should consider with their backup strategy. These will be more applicable to shared storage, which is managed by the company, as local storage and cloud storage may have limited options. Centralized storage used to be more expensive when servers had to be connected using a storage area network (SAN) with proprietary storage connections called host bus adapters (HBAs). Now there are numerous storage protocols that use ethernet connections and NICs, simplifying management and reducing costs. The best solution for an enterprise will depend on its budget and recovery speed needs.
- Hard Disk – The most common type of storage media is traditional mechanical hard disks. Over the past decade, the commoditization of storage has significantly lowered the price of disks, making them affordable for backup. Access to specific files is quick as the location of the files is indexed, which means the recovery software can “jump” to the correct place on the disk. Hard disks are also supported by virtually every backup provider, and they can be easily replaced if they fail. Since they are mechanical, failures do happen, so they are usually deployed in a redundant configuration using disk management solutions like RAID and mirroring. Recovery speed is average and limited by the disk speed and network bandwidth to transmit and recovery the backup files.
- Magnetic Tape Drive – Tape drives are the second most common type of backup storage media after hard disks. Tape drives are fairly cheap and can have massive capacity, so they are great for backup files that do not need to be regularly accessed, such as for archival content. Storage Tapes need to be read sequentially, which means that finding files can take a long time, but once the data is accessible, it can be read much quicker than hard disks. The major criticism is that the tape drive can easily fail and it often becomes worn out if it is used frequently. Once a tape drive is full, it may even require an admin to physically replace it with a new tape drive, which removes automation and significantly slows down recovery time. Check out this Altaro blog for more information about using backups with a tape drive.
- Solid State Drive (SSD) – SSDs have become more popular over the past few years as prices have declined; however, they can still be significantly more expensive than hard disks and magnetic tape drives. They are, however, much faster and more reliable as they do not have any moving parts. If you have the budget to use SSDs for backup, it is the best option and will give you the fastest recovery.
- Optical Storage – Optical storage includes DVDs and CDs, which use lasers to read and write the backup files. These disks are rarely used in the enterprise because of their limited capacity, slower speeds, and the frequency of needing manual intervention to change the disks. This is only a practical storage option for PCs and individual backup needs.
- Cloud Storage – Recently cloud storage has emerged as a popular option for storing backup files. While it is managed by third parties, which limits the amount of control that the enterprise has to manage this storage, it comes with numerous advantages. Usually, cloud backup will offer the option to use hard disks or SSDs, or both. Cloud storage generally comes with many resiliency features, and it functions as a disaster recovery solution. Cloud storage is usually fairly cheap since you are only paying for the capacity that you are using, rather than purchasing physical storage devices for future needs. The downside is that the data must be encrypted, and it is usually sent over the public Internet. This means that backup and recovery is usually much slower as the data must be copied over a longer distance.
Many organizations use a combination of storage media, which often dedicates the faster (and expensive) hardware to critical workloads and the slower (and cheaper) devices to less frequently accessed backup files. This is known as storage tiering, and more information can be found in this blog.
Backup Methods for Enterprise Computer Data
This section describes the most common methods to create backup files. There is a tradeoff between storage capacity and recovery time, which organizations need to consider when selecting the best option for their business needs.
- Full Backup – The most common type of backup is a full backup where all of the data is captured. This is always required even if you are then using other backup methods, such as an incremental or differential backup. This process may look a little different based on the component you are protecting, whether it is a file or a virtual machine. The downside of full backups is that they consume a lot of storage space as a full copy of the protected component’s data is captured each time. A full backup performs the following steps:
- The backup is requested by a user or automatically on a schedule.
- The backup software will identify the type of component which needs to be protected.
- The backup software will wait for the component to be in a healthy state so that a complete backup can be created. This may involve pausing the component, flushing any transactions which are in memory (“quiescing”), or closing a file. It is important that the file is in a “consistent” state so that it is healthy when it is restored. Additionally, any other data in which the file needs will be captured, which could include metadata, system settings, boot settings, and the disk layout.
- The backup files are stored.
- The backup software is notified that the backup was complete successfully.
- Incremental Backup –An incremental backup will back up the data which has changed since a full backup was taken. This method still requires an initial full backup, but afterward, it is much more efficient because each subsequent backup is faster and uses less storage space than the initial full backup. However, when incremental backups are restored, they are much slower because each backup file must be sequentially merged as they are structured like a “chain” of backup files. If one of the backup files is missing or corrupt, the backup will not be able to be recovered from that point onwards. An incremental backup performs the following steps:
- The backup is requested.
- If a full backup has not been taken, then take a full backup and save the backup file using the steps above. If a full backup has been taken, then only changes since the last incremental backup will be saved in an incremental backup file.
- The full backup (if needed) and the incremental backup file is stored.
- The backup software is notified that the backup was complete successfully.
- Differential Backup – A differential backup is similar to an incremental backup since it requires taking a full backup then taking smaller subsequent backups. The main difference is that each of these secondary backups tracks all of the changes since the full backup, whereas the incremental backup tracks the changes since the last incremental backup. The advantage to using differential backups is that they only require two files to be merged to restore the backup, the full backup, and the most recent differential backup, so this process is faster than restoring incremental backups. Also, if one backup file is deleted, it doesn’t impact any of the other backups unless it is the full backup file. The downside to using differential backups is that it takes up more storage capacity than incremental backups since each backup file contains all the changes since the full backup. A differential backup performs the following steps:
- The backup is requested.
- If a full backup has not been taken, then take a full backup and save the backup file using the steps above. If a full backup has been taken, then all the changes since that last full backup will be saved in the differential backup file.
- The full backup (if needed) and the differential backup file is stored.
- The backup software is notified that the backup was complete successfully.
- Continuous Data Protection – If you require data protection with almost no data loss, then some providers will offer a continuous data protection (CDP) option. This works by saving a copy of the file each time that a change has been made. Behind the scenes, this could use incremental backups, mirroring, or some other type of replication technology. The benefit is that in a disaster, there is very little data loss. Still, the tradeoff is that this type of backup requires a lot of processing overhead and storage capacity for all of the backups, so it is usually expensive.
It is possible to use multiple backup methods for different types of workloads based on their priority or other business needs. There are even more advanced backup types, which include synthetic full, reverse incremental, and incremental forever, although these are usually not offered by all ISVs.
Backup Features for Enterprise Computer Data
This section will review some of the popular backup features which different ISVs offer to enhance the speed or maximize the capacity of the backup and recovery process. It is still important that you verify with your ISV that the features you needed are supported by them on your specific operating system, hypervisor, and storage media. The following list shows the most commonly requested features.
- Compression – A majority of backup providers will include built-in compression tools so that the backup files take up less space on the disk. This slows down the backup process, and decompression will slow down recovery.
- Encryption – Since backup files will often contain sensitive information, many organizations will want to protect them with encryption automatically. This is particularly important if those backup files are stored in a remote location or transmitted across public networks. This slows down the backup process, and decryption will slow down recovery.
- Duplication – Some organizations which to have multiple copies of each backup file, sometimes sending them across different locations so that the backup can be recovered from different sites.
- Deduplication –This feature will detect identical files or blocks of data and only retain one copy, removing the duplicate files. This process can save a significant amount of storage space since many backup files contain identical and redundant content.
- Item-Level Backup and Recovery – Instead of protecting and recovering an entire database, some ISVs will provide more granular options for specific workloads. For example, with Exchange Server, it is a common practice to offer the ability to recover a specific mailbox for a single user rather than to restore the database with all mailboxes for all users.
- Data Grooming–One common challenge that enterprises often face is deleting old or unneeded backup files. Some regulated industries even require old backups to be deleted after a certain period to maintain compliance. Data grooming allows you to set policies to automatically retain backups for a time then automatically delete them when they are no longer needed.
- Multiplexing – Larger organizations will likely have more backup sources and storage locations. Multiplexing allows for multiple backup writers to access the same disk in a coordinated fashion so that all the backup data can be written simultaneously to specific storage media.
Recovery Options for Enterprise Computer Data
Being able to efficiently and completely recover your backup is a critical part of the process. Recovery should be tested regularly as part of the standard operating procedure to familiarize the staff with the process and to minimize the chance of data loss. When planning for recovery, the two common goals that organizations should consider include:
- Recovery Time Objective (RTO) – This defines the goal for how long it should take to recover a backup. This includes the amount of time to detect that there is data loss, identify the appropriate backup to use for recovery, copy the backup file, restore it to the running system, then reconnect any dependent services or users. Organizations want to minimize the RTO so that they can bring their systems online as fast as possible after a disaster.
- Recovery Point Objective (RPO) – This defines the goal for the amount of data (in time) that can be lost during a disaster. This may be determined by the frequency and completeness of each backup. For example, if a backup is taken every hour, then up to an hour of data could be lost. Similar to the RTO, the organization should try to reduce its RPO.
Check out Altaro’s blog for more information about RTO and RPO [add link]. The final set of recovery options to consider are whether the recovery can be automated or if it requires manual intervention to find the backup file from the storage media.
- Online Recovery – If the recovery server is connected to the storage media, which hosts the backup file, recovery can be made fairly quickly. The downside to support this would be if malware, particularly ransomware, infects the primary server. This threat could also spread to the storage and damage the backup files, making them unrecoverable.
- Offline Recovery – This option is less desired as the storage media for the backup files as it is not directly connected to the recovery system. This means that when recovery is needed, the staff may need to manually find the drive or tape drive containing the backup before the recovery process can begin. Organizations may use this to reduce costs, but another advantage is that it is hard for this offline media to be infected by a datacenter-wide virus or malware.
This article was designed to provide you with an overview of computer data backup and recovery. It described some of the key features which organizations should look for when selecting their backup software vendor. There are many tradeoffs to be considered when evaluating the frequency of backups, the cost to store those backup files, and other desirable features. Altaro is one of the industry’s leading backup providers and supports a majority of the scenarios described in this article, so be sure to consider them as you plan your organization’s backup strategy.
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!