Save to My DOJO
To architect a backup solution that comprehensively covers all the organization’s needs, an IT professional must satisfy three criteria: location, type, and capacity. In other words, where will we put the data, what will we put it on, and how much space do we need? This article assumes that you already know your RPOs, RTOs, retention policies, and what you need to back up. We treat that knowledge as a storage “problem” for which we will architect a solution.
Location, Location, Location (Where to Back Up Data)
A proper backup solution does not answer the location question with “on-premises” or “offsite”. Treat this as an “and” phrase. Location governs the types of available storage and transfer speed of data. You will need to divide and direct your backup operations in a way that you can realistically achieve your RTOs, RPOs, and retention policies in any probable event. Because you need both onsite and offsite backup, you will need to consider multiple storage types. For the rest of this article, we will categorize media types according to their location options and talk about capacity at each point.
Public Cloud Backup Storage
If you choose to use a cloud provider to host backup data, then they will make most of the media type decisions for you. Instead, you will choose between speed and protection options.
Fully Managed or Self-Managed Cloud Backup
Many vendors offer fully managed cloud backup systems. They involve some sort of purpose-built agent for on-site and cloud-based operating systems. Some will also back up data directly from applications. The company that operates the cloud will provide options, as will third parties.
Alternatively, or additionally, you can use a cloud provider’s general purpose storage offerings as targets for your backup. You will determine what data to send, how to transmit it, and how to manage its lifecycle. Many on-premises backup applications can target this type of storage.
Speed, Capacity, and Price of Cloud Backup Storage
No matter the method, you will pay the provider for the storage that you use. As with most cloud services, vendors offer multiple storage tiers. Typically, price binds most closely to speed first, then capacity. If you can tolerate hours of waiting for data retrieval, then you can often get substantial amounts of cloud storage at extremely low cost. If you want your data to travel more quickly, then you’ll pay more for the same amount of storage.
In most cases, you will not concern yourself with the IOPS capability of storage that you use for backup. IOPS plays most prominently in random access compute workloads. Moving backup data across the Internet, even with the highest available public speeds, will not stress the IOPS limits of standard cloud storage. Purchasing higher tiers only to hold backup might possibly speed up backup operations for resources in the same cloud datacenter, but not likely enough to justify its cost.
Data Integrity of Cloud Backup Storage
As a benefit, the cloud provider takes on some responsibility for the integrity of data on their systems. That does not make it completely fool proof. Read your agreements carefully. The provider likely has multiple terms that limit their liability. The provider may offer monetary restitution in case of data failure. They may fall back on that in place of complex data protection systems, especially on their less expensive tiers. Take time to understand what your subscription buys and plan accordingly.
Cloud backup storage offerings usually add geographic redundancy and other replication options for an additional cost. While not a perfect protection against corruption, replication does guard against localized events. Understand that a cloud replica, just like a private replica, can receive corrupted data without generating an error.
Private Remote Backup Storage
You have options besides public cloud for remote storage. You can choose solution hosted by a third party (also called co-location) or yourself. Hosted data differs from cloud data in that you will purchase control over a specific set or subset of hardware in one or more specific locations. As technology progresses, some of the distinction between “cloud” and “hosted” will fade away, but the general difference relates to your ability to know the precise location of your data.
Third-Party Hosted Storage
Third-party hosted storage usually comes at a lower price than self-hosted storage. You will only pay for the hardware and the portion of the facility that you use, not the entire site. If you run a small business with a small budget but want to maintain more control over your data than a cloud provider allows, third-party hosting presents an attractive option. The primary drawback is that you do not have the same options for storage technology as you do for cloud or self-hosted storage. In trade, someone else deals with the physical component. You must configure the backup and restore software and transmission process, however.
Self-Hosted Remote Storage
If you already own a facility with significant geographical distance from your primary site, you might have an opportunity to host your own offsite backup. Even if you don’t have such a site, you might find that you can rent small office or even warehouse space at an affordable rate and use that as a disaster recovery site. You bear the facility costs, but you gain complete control over your backup and disaster recovery configuration.
Choosing and Using Remote Storage
Whatever remote option you choose, you take on the bulk of the responsibility for getting the data there. Whereas cloud providers go to great lengths to provide secure options for onsite-to-cloud data connectivity, hosting providers usually do not. They may offer secure FTP or something similar and nothing else. If you own both (or multiple) endpoints in a backup chain, then you control the security of data transmission. Because this article is not about security, I’ll leave a greater investigation of this subject for other articles. Take care to properly secure your inter-site data transmissions and at-rest data in remote sites. However, if you employ backup software in this task, such as Altaro’s Offsite Server, then it may include encryption capabilities to reduce your effort.
As for storage media, your choices with a third-party host are limited to whatever they offer. It may just be a big pool of data disks and nothing else. However, they might offer a second-tier backup option, such as copying your replicated data to tape or other locations. Consult with your provider to discover their options.
When you own the remote site, you have the same choices as you do for on-premises storage, which we cover in the next section. You must figure out how to adequately use it as protection for the primary site, but otherwise, it presents no special hurdles.
On-Premises Backup Storage
With on-premises storage, you take on the responsibility of deciding on media and target types. The balance of price, speed, and capacity works differently than cloud options, but still follows the same general pattern: more money for higher speed and capacity. You have plenty of options, which we’ll explore in general order of popularity.
Disk-Based Backup Storage
In the not-so-distant past, few could afford to use disk as their backup medium. The price per gigabyte continually falls, though. Even better, new market entrants have forced down the costs of chassis-based high-capacity storage. Even small budgets can now afford devices that will safely hold multiple backup copies.
This category includes four major sub-categories:
- SAN (storage area network): high-capacity, high-speed, high-cost block storage
- NAS (network-attached storage): high-capacity, moderate speed, moderate cost file storage
- Storage-heavy commodity servers: standard servers with hardware that focuses on storage capacity over other components
- External hard disks: this category includes temporarily attached disks, usually USB
SAN devices carry a high cost (due to low competition and premium features), but also give the greatest performance and scalability. In a completely greenfield project scoped specifically to backup, you probably would not choose SAN for backup. You can almost always scale out NAS or commodity servers to a space and network capacity that satisfies your backup needs at a lower cost. However, if you already have SAN capacity and do not anticipate needing it for production purposes for a long time, or if you want to have do-it-all storage, a SAN can meet the goal. Just take care that you do not use a single device for everything; backup must exist independently of its source data.
The major difference between NAS devices and commodity storage servers is that a NAS is purpose-built by the manufacturer to serve solely as a storage device. A commodity storage server could run typical compute or general-purpose workloads, although its hardware is usually configured to maximize disk space at the expense of compute capacity and it likely has some software enhancements tuned for storage. Traditionally, NAS devices are “file”, not “block” storage devices. A block storage device presents its storage to consumers like a locally-attached unformatted hard drive. A file storage device must be formatted before use and can only present its space as a network share (e.g., SMB or NFS). Modern NAS devices often present block space as well, so the block vs file distinction usually does not apply anymore. Currently, SAN and NAS devices differ mostly by capability, especially scalability. Most SAN devices allow you to distribute the same logical storage location across multiple devices; NAS devices do not. Commodity storage server software has begun to evolve beyond NAS capability until SAN capability, but still matches more closely with NAS.
External hard disks have almost none of the capabilities of NAS or SAN devices, but you can transport them easily and keep them in a disconnected state between backups. You trade speed and convenience for safety.
Drawbacks of Disk-Based Storage
The speed of NAS and SAN also present a weakness. They remain physically and logically connected to the rest of your network at all times. Therefore, any bad things that occur anywhere will find their way to the backup system sooner rather than later. Ransomware instantly comes to mind, but you also need to worry about other malware, data corruption, malicious activity, and physical catastrophe. While you can defend against such problems, you can never consider online storage as truly safe.
External disks protect against most of the problems in the previous paragraph. They will faithfully copy any ransomware or other malware that reaches them, unfortunately, but if you know that a network has an infection then you just need to leave your disks disconnected. Also, you can quickly and conveniently transport external disks offsite. However, if you require several external disks per day to contain all your backup data, switching them can become a tedious chore.
Magnetic disks always have some risk of data corruption due to fluctuation. When powered, disk controllers can mitigate some of that, especially in arrays. When unpowered, you don’t have as much local magnetic activity to worry about, but background radiation and the lack of any active control mechanism present their own problems. Following backup best practices will give you the greatest protection (always maintain multiple distinct copies and test periodically).
SSD for Backup Storage
I guess that you hit this sub-section because you have one particular question: Can I use SSD for backup data? Honestly, we don’t know. We know that when heated, SSDs lose data quickly. When I say heated, I don’t mean oven temperatures; I mean, like, backseat of your car in June (northern hemisphere) temperatures. How quickly? Well… that varies. At room temperature, you can probably trust an unplugged SSD for a year. If you chill it, maybe it will stretch out to ten years.
How much does this matter? Remember your best practices: keep multiple distinct copies, test regularly. If you keep them in a reasonably cool location and periodically energize them to verify their contents, then they will probably last just fine. Neglect them, and their cells will discharge. But, you probably won’t find any SSD manufacturer that will guarantee anything.
Tape Backup Storage
We have used tape for so long that most people still think of it first when anyone mentions backup. It wins over disk when transportability and long-term storage matter most. However, it has fallen significantly behind disk in speed metrics. It also lags in dollar-per-gigabyte and capacity-per-unit scores. Furthermore, while you can still find functional PATA and wide SCSI controllers for disks that have probably long since demagnetized, you might struggle a bit more to find a working tape drive to read that ten-year-old tape that has probably lost almost no consistency. You can plan for drive obsolescence by storing a working unit with tapes when you switch technologies, but the organizations that don’t do that greatly outnumber the ones that do; they often switch technologies because the previous drive failed, and they wanted a new one anyway.
To overcome speed and capacity limits, you can employ tape libraries. These further increase the advantages of disk over tape in capacity-per-dollar, but they allow you to maintain the portability and durability advantages of tape. To balance that out, you can craft disk-to-disk-to-tape strategies that use disk for short-term rotation and tape for less frequent long-term backups.
As time goes on, disk will almost certainly replace tape. As solid-state costs fall and reliability increases, we will eventually use it for everything.
Optical Media Backup Storage
I mostly mention optical media for completeness. Once upon a time, we all believed that the high capacity and near indestructibility of optical disk would solve all our backup problems. Then optical hit a per-unit capacity ceiling somewhere around 100GB while spinning disk capacities continued to double. While tape couldn’t keep up with other magnetic storage, it outpaced optical in every metric. Also, it turned out that even though optical media lasts effectively forever, the typical “burned” variety will lose data after a few years. Pressed discs have indefinite age, but exorbitant cost.
To set expectations, let’s say that you have 4TB of data to store. You select 100GB M-Discs. You’ll need 40 discs which will need 40 hours to burn at a cost of $1,600. All that gets you exactly one copy – assuming no errors occurred while burning any disc. However, the M-Disc apparently has a substantially longer lifespan than other optical media, so that’s a benefit.
Using Multiple Locations for Backup
To fit the minimal definition of backup, you need one complete copy of your data onsite (production data) and one complete copy offsite (cold data). You can build a robust solution on top of that foundation. This will require knowledge of RTOs, RPOs, data size, and budget. The following items show a few examples to help you architect.
- Disk-to-disk-to-disk: You can backup your production data to a live, always-on local storage location, then duplicate that on external disks.
- Disk-to-disk-to-offsite: Instead of external disks in the previous example, you can make the second hop transmit to a remote facility, either cloud or hosted. However, you must employ some precautions that prevent every backup from being accessible online. Without cold data, you expose yourself to malware and other malicious activity.
- Disk-to-disk-to-tape: This replaces the external disk in the disk-to-disk-to-disk scheme with a tape drive in the last step. Tape allows for very long-term archival storage, so you can use it less frequently to capture complete backups.
- Disk-to-offsite-to-offsite: You can transmit data offsite to multiple locations in parallel or in a chain or in any other configuration that your software and processes allow. You could configure a mesh pattern that maintains many copies in many disparate locations. Remember the importance of cold data and testing, though.
Backup Capacity, Revisited
We talked about capacity in the individual sections, but it might help to cover it more generically. Properly layered, all media types and locations provide effectively infinite capacity. Your barriers come from time and money. In speed order:
- NAS/commodity storage server
- External disk
Cloud and hosted speed depends on your connectivity and the technology in place at the receiver, but typically ranks between external disk and tape. If you have very high speed Internet or you have the budget for fiber connections between your sites, then it can approach SAN speeds.
Expense ordering depends on how you prioritize metrics. If you value longevity, then tape provides the least expense per year. If you value capacity per dollar, then on-premises NAS rules all. Consider transport convenience and the hot/cold applications.
The final piece to consider is your need for rotation and retention. To start, think of the venerate grandfather-father-son (GFS) rotation scheme. Each day’s backup overwrites the previous day’s (son) until a specified day of the week. On that day, you perform a complete backup, which you overwrite on the same day the following week (father). Once each month, you remove that weekly backup copy from the rotation, store it (grandfather), and introduce a new “father” tape for the following month. Depending on retention needs, some organizations overwrite the grandfather each month and retain only an end-of-year backup. GFS has other variants; some organizations only write differentials on the “son” tape due to time constraints, others have tapes for each day that they use to hold incrementals.
GFS was designed for tape systems, but the same concepts apply to modern disk-based systems. For example, think of Altaro’s deduplication technology. Every backup that invokes a deduplication pass works like a “son” in GFS. The latest full backup was the “father”. To recover from a complete loss, you must have at least one “father”.
With that understanding, you can now properly estimate your space requirements. Every “father”, regardless of location or media, requires approximately 100% of the size of the production data. While many technologies offer compression, do not overestimate its savings. The size of each “son” depends on daily churn rates. The impact of that churn will vary greatly between backup technologies; a full backup requires as much space as a “father”. Incremental backups require less, then differentials. Compression may play a factor, but again, beware optimistic expectations. Advanced technologies like Altaro’s deduplication can have a substantial shrinking effect on “son” sizes. In all cases, it will require an understanding of your organization’s typical daily activities to predict the size of non-full backup runs. You will almost certainly need some experience with your chosen backup tool(s) for useful predictions, but most organizations churn a very low percentage of their total data on a daily basis.
From here, you have simple addition to perform. If it helps, map it out. Example:
Backup additions example
With a quick sketch, I see that I can buy one 4 TB disk to handle the backup workload and budget for a new disk each month. I will rotate out the June and December disks, so each year I will need to replace two of the monthly disks. If I can get an external 4 TB drive for $100, then my total spend for the first year comes to $1,300. Replacing the June and December disks will cost $200 for the year. Budget a few hundred dollars for the occasional drive failure, add in software expenses, and you’ve effectively predicted your backup budget for the next 3 years or more. At 5% annual growth, you will exceed the capacity of a 4TB drive in a few years, but falling disk prices will likely handle that without a budget adjustment.
Keep It Going
In my last example, I used only external disks. If you keep the daily disk onsite and keep the weekly and monthly disks offsite far enough away from the primary site to survive any probable disaster, then you have a perfectly viable backup solution. Is that enough? You decide. Add on and expand to other options as your budget allows. The only line for “too much backup” is “way over budget”. Keep copying, and remember to test that backup!
Not a DOJO Member yet?
Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!