By: Vitor Montalvao | Updated: 2020-04-13 | Comments (2) | Related: > Azure
Problem
I need my VMs to have the ability to recover from failures and continue to function in a way that avoids downtime or data loss. How can I achieve that in Azure?
Solution
In this article we are going to look at the High Availability, Disaster Recovery and Backup options available in Azure for virtual machines.
High Availability in Azure
We all want our systems to be available 100% of the time with no failures, but this is almost impossible to achieve in the IT world. Since Azure VMs run on physical servers hosted within the Azure Datacenter the chance for a failure will be always there (when a physical server fails, the virtual machines hosted on that server will also fail). We can't think only about failures for the machine’s availability, but also with regular hardware and software updates that sometimes requires the VM to be rebooted making them unavailable for a short period of time.
Note: Microsoft does not automatically update your VM's OS or software. You have complete control and responsibility for that. However, the underlying software host and hardware are periodically patched to ensure reliability and high performance at all times.
Good news is that Azure offers some options that can provide availability from 99.9% to 99.999% for their VMs.
Azure Availability Sets
To achieve resiliency, we'll need to build redundancies in Azure to avoid single point of failure, starting by having a minimum of two instances of each VM (availability set) to having the VMs in more than one datacenter (availability zones) and even spread to more than a region (region pairs).
Microsoft offers a 99.95% service level agreement (SLA) for multiple-instance VMs deployed in an availability set. Below is a figure with an example of an availability set for IIS and SQL Server:
How does an availability set work? As per the above image, you can see that each instance of an VM is created in different racks which means different physical servers. If there's an issue or outage on one of the physical hardware where your VM is running, the other VM instance will keep running because they are on different hardware. All this is managed automatically by Azure itself. You just need to create the necessary availability sets and create your VMs in the proper availability set depending on what solution you want to provide. The recommendation is to place VMs in an availability set with identical set of functionalities and have the same software installed.
Availability sets are also used for software upgrade and OS patching strategies. This will allow you to update and upgrade VMs per fault domain since Azure only updates a fault domain at a time and ensures that all VMs in a fault domain will be rebooted at the same time and only after they all come online it will upgrade the next fault domain.
In review, an availability set is a logical feature used to ensure that a group of related VMs are deployed so that they aren't all subject to a single point of failure in the datacenter.
When creating an availability set it requires a Name and the Region where the availability set will be created. After that you'll need to define how many Fault and Update domains will be part of the configuration of the availability set:
Each Fault Domain has a group of VMs that shares a common power source and a physical network switch. You'll need at least a minimum of 2 Fault Domains to be able to create an availability set. Maximum is to have 3 Fault Domains.
The Update Domain is a group that Azure uses to patch your virtual machines and restart them together (VMs in an Update Domain will be rebooted all at same time). Azure never restarts more than one update domain at a time so define your best strategy to not cause any outage during the reboots (always keep at least one instance of the VM working to provide the necessary service to your business and users without letting the service be interrupted). You can have up to 20 Update Domains.
If the VMs are using Managed Disks then the availability set will be also a Managed Availability Set, meaning that Azure ensures that all the managed disks attached to a VM are within the same managed disk fault domain.
NOTE: While you don't pay for the Availability Set feature you still need to pay for each VM instance that you create in the availability set.
Azure Availability Zones
Within each Azure Region there is at least 3 (three) physically separated zones known as Availability Zones. Basically, each Availability Zone is a datacenter. If you use these Availability Zones to replicate your VMs, you can protect your apps and data from the loss of a datacenter. If there's an outage in a Zone, the replicated VM will still be available in the other zones and there won't be any disruption of the service you are providing to your users. With this solution Microsoft can offer an SLA of 99.99% for your VMs uptime.
You define the usage of the availability zones during the creation of a resource. Below is an example for VM creation:
Like the availability sets, you don't pay for the availability zone feature. Likewise, you will pay per resource instance and also the VM data transfer between availability zones since this is considered outbound data transfer. You can check for the bandwidth pricing here.
Azure Site Recovery
Azure Site Recovery is a service that provides replication, failover and recovery options. With this service you can replicate an Azure VM and even on-premises VMs and physical servers to a different region (from a primary location to a secondary location). When an outage happens at your primary site it will fail over to the secondary location allowing your users to continue to access your applications without interruption. You can then fail back to the primary location once it's up and running again. Azure Site Recovery service isn't free and you can find the pricing here.
You can enable the Disaster Recovery (DR) from the Azure VM configuration as shown in the below image:
In Azure, regions work on pairs so you will replicate to a paired region for the VM that you want to configure the DR settings. You can learn more about Azure Paired Regions here.
After choosing the target region you can click on Review + Start replication option to enable the VM replication. You will be able to verify the replication status after it finishes.
Availability sets and zones are features to ensure the services running in your VMs are always available to your users and customers but they aren't foolproof. We know that data can be corrupted and software can have bugs so what are the solutions that Azure offers for this kind of issues? Let's take a look at other disaster recovery and backup techniques.
Azure Backup
Previously we saw how to keep your Azure servers running despite significant disruptive events. Now we will see what are the offers in Azure to recover from a disaster.
Azure Backup Service offers protection for physical and virtual machines no matter if they are in the cloud or on-premises datacenters. It also offers protection for other services as Azure File Shares, SQL Server and other databases running on Azure VMs. It also provides application-consistent backup, meaning that a recovery point has all required data to restore the backup copy for a given application. Below is a figure that shows what can be backed up with the Azure Backup service.
In terms of cost, here is the pricing, but as usual in Azure, you just pay for what you use. Azure Backup does not charge nor limit the data that is transferred (inbound or outbound data transfer).
Azure Backup keeps your data secure by encrypting it and you can define the retention for your backup data with no limit for the retention time.
The backups are stored in a Recovery Services vault with built-in management of recovery points so you can easily restore as needed. The Recovery Services vault has a Centralized monitoring and management that provides built-in monitoring and alerting capabilities.
Azure Backup automatically allocates and manages backup storage using the power and scalability of Azure to deliver high availability.
By default, the replication option offered by Azure Backup is the geo-redundant storage (GRS) one where the data is replicated to a secondary region but it also offers locally redundant storage (LRS) where all copies of the data exist within the same region.
By default, an Azure VM doesn't have a backup policy. To configure a backup policy for a VM go to the Backup option under Operations and create or select an existing Recovery Services vault and create or choose an existing policy for the VM and at the end click on the Enable Backup button:
Azure Backup Service utilizes Microsoft Azure Recovery Services (MARS) agent to back up data from servers to Azure Recovery Services vault. When you enable an Azure VM for backup it will install an Azure Backup VM extension to enable the feature to back up the entire VM. If you want to backup specific files and folders then you'll need to install the MARS agent in the respective Azure VM.
For on-premises physical and virtual machines, you'll always need to download and install the MARS agent in the respective computers to enable them to backup to Azure Recovery Services vault.
Azure Recommendations
You'll see that if you don't configure any High Availability or Backup options on your VMs, Azure will report the necessary recommendations for you to be able to build the expected resiliency for your machines:
Next Steps
- Learn about the SLA for Azure Virtual Machines.
- Check for the Azure availability best practices.
- Set up disaster recovery for Azure VMs.
- Back up an Azure VM from the VM settings.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips
Article Last Updated: 2020-04-13