My experience in the VM backup space started more than 7 years ago. All this time I have come across many questions, problems and scenarios for VM backups and worked with a handful of technologies and operating systems (c#, WCF, HTML, Typescript, C++, Javascript, Cloud APIs, SDS APIs, Windows, Linux). In this article, I want to share my insights with you on what I've learned are some healthy backup practices for SMBs (Small and Medium Businesses).
My experience is that a product should always target a specific market. Trying to cover everything brings a lot of complexity and impacts usability - SMB organizations typically have less than 300 VMs and less than 25TB storage. This leads to one of the first things to keep in mind about backup. You need to consider the infrastructure - the size and topology. You need to choose the right backup tool and the right environment, based on the hypervisor vendors and size (VMs and data).
One of the most important questions is how often you make backups and how often you have to store and archive them. Customers will refer to this as the RPO (Recovery Point Objective). If you have no legal requirements for retention and archiving, you should apply the 3-2-1 rule as a minimum standard. This rule says that you should have at least 3 copies of the data, stored on two different media types. In addition, one of these copies should always stay offsite. For example, a good approach could be to use a D2D2T (disk to disk to tape) policy where backup is done to disks and tape or, even better, a D2C2T (disk to cloud to tape) policy, where data is stored to disk, and replicated to both a cloud storage target and to tape. Archiving is very important and keeping long term archives can be business-critical from both a recovery and compliance perspective. If you have data that has been slowly damaged over time and you need an older copy to reconstruct it, you will be very happy to have an archive copy to go back to. If you plan to do long-term archiving, be careful about media deterioration in order to avoid a bad surprise in moments of need. Testing your backups is a chore, but it's essential.
An option to consider regards incremental backups, which enable you to save only changed data between backups, saving time and storage space, but which will increase dependency on previous backups and possibly decrease restore procedure performance. Keep in mind that if you save a VM with high I/O disk iteration, the incremental element could also consume a large amount of space, and could easily use up your storage capacity if you don't plan your retention correctly.
When it comes to the different media, the typical repositories are disk and tape. But, in today's world, cloud storage is very affordable and available on-demand, saving SMB organizations from a large outlay of CAPEX (Capital Expenditure). In all scenarios, data privacy is always an important factor. Therefore, I recommend encryption as a minimum measure on all offsite backups, not only to the cloud, but also to tapes which are typically shipped offsite. Encryption is usually more CPU hungry, but the security is worth it.
One of the most important things that is often forgotten is to test backups. Trusting the software report telling you that the backup just works is not enough. Backups need to be tested, and are typically done through manual restores or automated backup tests. Most vendors often offer automated solutions to test their backups after the backup session is completed. It is also important to verify that the backed-up data is up-to-date and does not contain any older states of your VM.
A topic that people often forget when they talk about backup is the restore process and restores planning. It is important that you have a good idea about what the process is and how long it may take from your different media. Customers often refer to this as the RTO (Recovery Time Objective). To be able to choose the correct procedure, think of all the possible scenarios you want to be able to avoid. For this, try to respond to the following questions (for every VM you could have a different answer). If you lose your VM:
Will you need to restore the whole VM, or only the data/subset of the data within it?
If you only need data, you may consider file-level restore or item-level restore (E.g.: single file or MS Exchange item), giving you the possibility to extract only the data you need without the need to restore the entire VM.
If you need the VM restored, how much time can you give the restore process (seconds, minutes, hours) before you start losing clients or money?
Some solutions permit the possibility to recover your VM instantly. It may have reduced performance for a short timeframe, but it could save you from a lengthy downtime. If treated correctly, you will easily be respected as an IT master ;-)
Note: Remember not to over/under estimate the importance of what you want to restore. You don't want to spend money if you don't need to, but you don't want to find yourself in a situation without solutions either. It's a fine balance.
Always keep in mind that backup is the necessary step, but the goal is to have data archived and, in worst cases, have only a short outage. It is best to look for vendors that offer the option to restart the VM from the backup location. In this way you can avoid waiting for a long restore process to finish.
Now let's see some basic tips and tricks that I have learned during the development of the backup software and through customer feedback:
If you run many identical VMs (e.g. many Windows/Linux servers), use deduplication for your storage. It will save lots of disk space!
If you plan to use data encryption or compression, avoid deduplication as it will be less effective.
If you use cloud storage to archive your backups, be sure that you have enough bandwidth for the upload and reconsider long-restore processes.
Avoid too long incremental chains by doing full backups as frequently as possible (e.g. during weekends) to reduce dependency and restore performance loss.
Most VM software vendors take a default snapshot on the Hypervisor OS level. On VMs with high I/O disk activities, this may have side effects to your VM as it may be unresponsive for a short while due to the snapshot consolidation. During the backup process, the snapshot data will grow occupying more on your storage. If your storage array supports snapshots, use these, as they have no effect on the data process.
Keep in mind that the level of consistency offered by backup may vary: Always try to obtain Application-consistent backups!
Concluding, the virtual environment evolves every day. Keep up to date with the various technologies at your disposal, but be sure that your environment is always stable. The best procedure is to spend time to test your options and the various outcomes. It may take some of your time at the beginning or during updates, but you will avoid bad surprises in the future.
D2D2T : Disk-to-disk-to-tape is a data storage and backup technique where data is backed up on a disk before it is copied to a backup tape device. This data backup process temporarily first stores the primary disk content to another disk and then to the backup tape device.
D2C2T : Disk-to-cloud-to-tape is similar to D2D2T, but this data backup process first stores the primary disk content to cloud and then to the backup tape device.
Full backup: a backup of the whole data of a VM. It has the advantages not to depend on previous backups, the restore process is the fastest, but it will occupy the most of your storage space.
Incremental backup: A backup of all changed data since the last backup. It optimizes the needed storage space, but the restore process will be slower compared to full backups.
Deduplication: This is a specialized data compression technique for eliminating duplicate copies of repeated data. Related and somewhat synonymous terms are intelligent (data) compression and single-instance (data) storage.