AWS RDS 101: Multi-AZ vs. Read Replicas

As with most of my other blog posts, I love documenting key takeaways from my learning journey. One of the key takeaway would be the difference between RDS Multi-AZ deployments and Read Replicas. While the distinction may be obvious to some, it may not be for people who are new to AWS services and technologies.

As a budding Solutions Architect, I find myself questioning which design to use. If I’m designing a resilient architecture, should I be using a Multi-AZ deployment, or should I be spinning up a Read Replica? Or both? Are these the same and if not, when do I use what?

Before diving into the differences, allow me to clarify some of the common jargons that you will come across.

Availability Zones (AZ) are discrete, isolated data centers that comes with redundant power, fault lines, flood plains, networking and connectivity within a Region.

Regions are physical locations around the world where we cluster data centers. Each AWS Region consists of multiple, isolated and physically separated AZs within a geographical area.

Amazon Relational Database Service (RDS) is a managed relational database service that comes in six database engine flavours (Aurora, MySQL, MariaDB, Oracle, Microsoft SQL and PostgreSQL). Being a managed service, this means that manual routine tasks such as provisioning, patching, backup, recovery, failure detection and repair is being handled by AWS.

Now that we’ve cleared up what these terms are, lets first discuss what a RDS Multi-AZ design would look like.

RDS Multi-AZ Deployment

So what is a Multi-AZ deployment?

A Multi-AZ deployment provides availability for your database instances within a single geographical region. You have two database instances, the Primary instance situated in an AZ and the Standby instance situated in different AZ. This is depicted in the diagram above.

The data is synchronously replicated between the Primary and Standby instance. This means that the data is highly durable and both instances are kept in sync.

Failover Process

When there is an outage of your database instance, Amazon RDS will trigger an automatic failover to the Standby instance, which is situated in another AZ. There are several situations that can trigger this automatic failover:

  1. Primary Instance fails
  2. Availability Zone outage
  3. Database instance server type is changed
  4. Operating System of the database instance is undergoing software patching
  5. Manual failover of the database instance has been initiated

The beauty of an automatic failover process lies in the fact that you can continue database operations as quickly as possible without manual administrative intervention. There is also an option to do manual failover, typically for situations where you need to do maintenance or changes (via rebooting).

Now lets take a look at what RDS Read Replicas are.

RDS Read Replicas

What are Read Replicas?

Read replicas are a special type of database instance that are primarily used to offload reads from the primary database instance. You can reduce the load on the primary instance by routing read queries from your application to the read replicas. This is useful if you have applications that are read-intensive, for e.g. reporting applications. Read Replicas strictly operates as a database instance that allows only read-only connections.

How are the data kept in sync?

In this design, Amazon RDS uses asynchronous replication to update the read replica whenever there are changes made to the primary database instance. This is different from the Multi-AZ design where the data between the Primary and the Standby Replica is synchronously replicated.

Failover Process

You can promote a Read Replica into a standalone database instance.

One situation which you might want to do that, is when you need to implement failure recovery. When the primary instance fails, you can promote a read replica to take over as the new primary database instance.

However, do keep in mind that because the replication between the primary instance and read replica is asynchronous, this means that the data is only copied to the read replica after it is being written to the primary database instance. Hence, there are some limitations – in the event of a failover and the subsequent promotion of a read replica, there will be some gaps in the data set (i.e. there may be missing database transactions, relative to the primary instance).

The failover process is not automatic (unlike Multi-AZ), but it is pretty straightforward. You can easily promote the read replica into a standalone instance via the Amazon RDS console.

Differences

Now, after discussing what RDS Multi-AZs and Read Replicas are, the architectures do seem similar don’t they? They both have another database instance sitting in a separate AZ and in some sense, the Read Replicas seem to be “multi-AZ” because of that. This may be confusing to some, as it appears that both designs functions the same way.

Well, that is not the case.

One difference that I’ve highlighted earlier, is that in a Multi-AZ deployment, the data is replicated synchronously – all your instances have the same data at any given time. This is not the same for read replica designs as the data between the primary and read replica instance is replicated asynchronously.

Another difference is that you cannot use a Standby instance in a Multi-AZ design to serve read traffic as it is only used for failover. Therefore, the Multi-AZ deployment is not a read-scaling solution. If you need to serve or offload read traffic, you’ll need to use a read replica instead.

One simple way to understand when to use what, is this:

If you need to design a solution where you are focused on scalability (you need to scale the reads and decrease load on the primary instance), you should look into implementing read replicas. Amazon RDS can support up to 5 read replicas per database instance (for MySQL, MariaDB, PostgreSQL, Oracle and SQL Server).

If you need to implement a design where availability is the main concern, you should consider using a Multi-AZ deployment. 


With that being said, architecting and deciding what to include in your design is unfortunately not a black or white situation – there are a myriad of factors that should also be taken into consideration, depending on the customer’s requirements.

Summary

The good news is, while these two concepts are different, they can be complementary. Read replicas can be used in conjunction with Multi-AZ deployments to provide a resilient disaster recovery strategy for your database. You can configure a source database as Multi-AZ to achieve high availability, and at the same time, spin up read replicas for read scalability purposes.

There are a ton of data loss migration and resiliency strategies, so this is barely scratching the surface. If you’re interested to find out more, feel free to browse this link.

Also, I may cover more about Amazon RDS in future, so stay tuned!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s