In this article, we will talk about what happens with the data that is transmitted between Azure Regions and on-premises systems.
Nowadays, Microsoft is opening more and more data centers around the globe. Regions like Japan, the UK (to be announced), Brasil, Germany, Korea are already in-place or will soon be in-place.
These specific countries have laws requiring that specific industry or data does not leave the country. In the health-care industry, especially, it is common to have restrictions preventing patient-related information to leave the country at any time. Similar laws exist for other industries like banking or data protection.
In this context, we need to be aware of these laws, of the solutions we come up with and of what actually happens with the content that is transmitted between Azure Regions and on-premises systems.
The main goal of this article is to cover what happens with content that is transmitted between Azure Regions and on-premises system, and to draw your attention to the things that we should be aware of. Another article will cover the different configuration of our Azure Services and the way in which that might affect us.
Assumption: We do not take into account different Azure Service configurations that might move data to another regions. We assume that customers did not configure the Azure Service to back-up or replicate content in other Azure Regions.
All the data that we store in an Azure Region will remain in that region. The content that is stored in a specific region will not leave that region. This means that we can store data in the UK Azure Region without any kind of problems. The content will not leave the UK Azure Region by magic and be stored in another region like Germany.
If Azure Regions from that region have technical problems, even going down, our content will not leave that specific country - or the Azure Paired Region (based on the Azure Services configuration).
A very useful table can be found at the following link, which specifies the paired region for each Azure Region. As we can see in the link above, Japan East is paired with Japan West, the same country just another location. There are some exceptions that will be covered by the next article.
All the content that is sent by wire from an Azure Region to an on-premises system will go through the local ISPs (Internet Service Providers) from that specific country. This means that content will not leave that specific country as long as the on-premises system is in the same country as the Azure Region.
Things become a little more complicated when the data goes from one ISP provider to another and so on. We can have ISP providers that might detect that the load, on a route that leaves that country and comes back on, is better than the one that we have in the current country. In this case, they can decide to use another route. This means that during the transportation, the content will leave the country. This case can happen inside the same ISP, depending on the route load and the way in which the balancing is done.
A similar thing happens when, for example, a main route or a big local ISP provider is down and content is redirected through other routes. For example, even if we have a dedicated connection, for example between continental Spain and Grand Canary Islands (that is part of Spain), if the connection is down, data might flow through other routes like Egypt or the UK (these countries are given only as an example). This means that, even if we have a dedicated connection, the content might use another route and leave our country.
For this scenario, you do not have any kind of control. If you have enough resources, you can construct your own line between them, but this is expensive. We should remember that at transport level, we do not have any control on how the ISP sends the content and how the routing is done.
As we saw, restricting the data so that it does not leave a specific Azure Region is easy and can be controlled at Azure Services level. At transport level, things are more complicate and there is no way you can guaranty that data will not leave the country without a dedicated line or without custom contracts with ISP providers (when and where possible).
In the lines that follow, we must understand which replication features might not be compliant with our requirements - Data and Payload must not leave the country where the Azure Regions is established. We will take a look at the most common and important Azure Services.
However, before moving on, we should talk a little about the Azure Paired Regions.
If an Azure Region is an area from a specific geographical region that can contain one or more datacenters, an Azure Paired Region is another region from the same geographical region that is used for Business Continuity and Disaster Recovery (BCDR). Basically, a paired region is used for replication and backups.
From my perspective, the most important thing about Azure Paired Region is that they are placed in the same geographical region. For large countries like the USA, China, Australia, India this means that the geographical region will also be the same country. However, for small countries this will mean that the paired region resides in another country.
For example, in Europe we have an Azure Region in Ireland and another one in the Netherlands. This means that, if we configure our Azure Service to make a replica in the paired region (Ireland in this case), the data will leave the Netherlands and we will no longer be compliant with the local laws for a specific industry (Healthcare).
Of course, there is an on-going tradeoff between the cost and the risks that we want to cover. Many times, it is acceptable if one Azure Region goes down to have the system down because it means that a disaster happened and you could invest into doing replication and backup at another local supplier.
There are some industries where this might not be acceptable. For example, in healthcare, if a disaster happens, you need the system to be up and running because, in that moment, your system will be used by hospitals, which might make the different between life and death.
If you want to find out what the paired region for different Azure Regions is, you should check the Microsoft page, which is updated as soon as a change is done in as far as Azure Paired Regions are concerned.
When we talk about Azure Storage, please take into consideration, that this service is very often used by other services also, being a core service. In this context, you should remember that for other Azure Services, like Azure VMs, which requires you to have an Azure Storage for disk and storage, you need to use the right configuration of core services as well.
There are four types of storage replication that are currently supported by Azure Store:
Locally redundant storage (LRS)
Zone-redundant storage (ZRS)
Geo-redundant storage (GRS)
LRS is the only one that replicates the storage only in the same region where you create and persist the content.
You should be very careful when you what to use ZRS and you have country constraints. ZRS replicates your content in multiple facilities, which can be in the same Azure Region or in another Azure Region. In this context, you should be aware if the other facilities where your content is replicated are in the same Azure Region or in other regions (Paired Region).
There are multiple features that support Azure SQL Database clients not to lose their data and to have a minimum Recovery Time Objective (RTO).
The first feature is Point in Time Restore. This is an automatic backup that is done to your Azure SQL Database and it can be used to restore your database. These backups are stored between 7 and 35 days (depending on your tier). The backup is done in the same Azure Region as your Azure SQL Database
The second feature is Geo-Restore, which is similar to Point in Time Restore, but which relies on Geo-redundant storage (GRS) of Azure Storage. This means that the backup of your database is replicated in another Azure Region that might be in a different country.
The third feature is Active Geo-Replication, which is one of the most powerful replication features that are out-of-the-box on the market. You can have up to 4 replicas in different Azure Regions that will be sync with your main database. Of course, if you have any country restrictions, you should be aware of what region you are using when you configure the replicas.
If you are using this Azure Service, then you should know, from the beginning, that the backup is done automatically by Microsoft, via a built-in service. This backup service uses geo-redundancy backups in the paired Azure Regions. It means that, if you have any kind of country constraints and you are from Europe or Brazil, then you should double check what data you are storing there.
Together with Azure Data Factory, we are allowed to create custom backups and store them in our own Azure Storage, that can be in the same Azure Region as our DocumentDB instance.
I put this service on the radar, because, very often, people use a Hadoop system not only for processing data, but also for storing data.
The best part about HDInsight, which is related to backups, is that all of them rely on standard Hadoop (HBase,...) 'ways' where we have full control of our backups.
The Azure Storages that are used by this kind of clusters are specified during the cluster creation flow. These storages are Locally redundant storage (LRS).
https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hbase-geo-replication/Geo-Replication is supported in this kind of clusters, allowing us to have replicas in other Azure Regions. If we have country restrictions, then we should be aware of Geo-Replication if the Azure Paired Region is not in the same country (similar with geo-replication for Azure SQL Databases).
This new service that connects the devices that are on the field with our backend becomes more and more powerful and new features are added, at each new release.
From the disaster recovery perspective, Azure IoT Hub is not an out-of –the-box solution at the moment. We will need to take some action if we want to transfer a failure from one instance of Azure IoT Hub to another.
The good part is that, from this perspective, if we have any country restriction we are perfectly fine. All content stays in the same Azure Region as our primary instance. There are no backups or replicas of our data done in another region.
At the moment, Azure Service Fabric can run on only one Azure Region. Similarly with Azure IoT Hub, there are no replicas or backups done automatically in other Azure Regions.
Both full and incremental backups that happen inside Azure Service Fabric are done using storage from the same Azure Region as our cluster. When we are doing backups, we can specify the Azure Storage where the backups should persist. Because of this, in the context of country restrictions, it is our responsibility to use Azure Storages that have only ZRS activated.
private async Task BackupCallbackAsync(BackupInfo backupInfo, CancellationToken cancellationToken)
{
Guid backupId = GetBackupId();
await externalBackupStore
.UploadBackupFolderAsync(
backupInfo.Directory,
backupId,
cancellationToken);
return true;
}
This new service is still in preview and allows us to store and process petabytes of data. This warehouse creates backups of your data in Azure Storage and persist them in case of a failure.
The data is stored in Local Redundant Storage (LRS), meaning that your data does not leave the current Azure Region.
In one of the previews version of SQL Data Warehouse, there was support for Geo-Restore, by using RA-GRS of Azure Storage. It seems that this feature is not available anymore.
For services that are in preview, especially, we should constantly be aware of the changes that are happening. These changes may not or should not affect us very often.
The story is pretty simple for this service. Behind the scene, we are specifying the Azure Storage that we are using for "Play on Demand". In this case we are free to use any kind of Azure Storage - even LRS that is not replicated in other Azure Regions.
All events, which are sent to an instance of Azure Event Hub, are stored only in the region where our instance was created. For this service, we do not have any kind of problem related to country restrictions.
At the moment, for this service, there is no feature that would allow us to backup/replicate our secrets in another Azure Region. From this point of view, the scenarios where people would need such a replica or such backups are not very common.
There is no manual replication or backup done in another Azure Region.
Replication is done out-of-the-box, automatically, in the same geographical region. In this context we should check if it is in the same country or not.
For Service Bus, the story is simple, just like the in the Azure Event Hub case. All messages persist only in the original Azure Region and there are no backups done in different regions. There is no support for a namespace to span multiple Azure Regions.
Geo-Replication can be done only manually, by having two different namespaces (endpoints) in two different regions.
This services allows us to create backups using Redis Persistence Model (RPM). It means that a backup can be created in Azure Storage and restored later on, in time.
The Azure Storage needs to be from the same Azure Region, but we are not restricted to use only Local Redundancy Storage (LRD). This means that we need to be aware of what kind of storage we are using when we have country restrictions for our data.
This is the kind of service that I dream about all the time. Without any kind of storage restrictions, you can use as much storage as you need. This is the perfect ocean (smile) for you to store and analyze your data.
When you create an Azure Data Lake, you need to specify a location, an Azure Region, where all the content will be stored. That is the place where your ocean of data is created.
Based on the current information, it is not clear (for me at least), if there are any kind of features for replication across other regions. Because we are talking about large amounts of data, I assume that it is not intended to replicate the content across Azure Regions (and this is a good thing). In this context, my assumption is that Azure Data Lake is safe to use if there are any kind of country restriction. However, a check with Microsoft representatives would be a good idea before jumping into the ocean.
There are many services in Azure that offer different replication and backup mechanisms in the same Azure Region or across other regions. It is important to remember that we should check each service when we have special restrictions or requirements.
These kind of requirements, like country regulation for data, are not common and very often we can find other ways to be compliant with them: encryption or splitting data based on information type. I will talk about other ways in which we can solve this problem in future articles.