TIL: availability zone names don’t matter

Say you have access to two separate AWS accounts, and say you have EC2 instances running in a certain region and availability zone, e.g eu-west-1a, in both accounts. Today I learned to my greatest surprise that, despite the same name, they may actually be two totally different locations. Intrigued? Read on!

Yesterday there was a minor outage in AWS, where the EC2 service in one availability zone in the EU-west1 region was degraded:

[RESOLVED] Network connectivity issues

[06:17 AM PDT] We are investigating network connectivity issues for some instances in a single Availability Zone in the EU-WEST-1 Region.

[06:37 AM PDT] Network connectivity has been restored for the vast majority of the affected instances in a single Availability Zone in the EU-WEST-1 Region. Some EBS volumes within the affected Availability Zone are also experiencing degraded performance. We continue to work towards full recovery.

[07:35 AM PDT] Starting at 5:35 AM PDT we experienced power and network connectivity issues for some instances, and degraded performance for some EBS volumes in a single Availability Zone in the EU-WEST-1 Region. By 6:00 AM PDT, power and networking connectivity had been restored for affected instances and by 6:31 AM PDT, degraded performance for affected EBS volumes had been resolved. By 7:08 AM PDT, the vast majority of affected instances had fully recovered. The small number of remaining instances are hosted on hardware which was adversely affected by the loss of power. While we will continue to work to recover all affected instances and volumes, for immediate recovery, we recommend replacing any remaining affected instances or volumes if possible. The issue has been resolved and the service is operating normally.

We had machines impacted in two accounts, but nothing serious: there was no visible impact on customers and the services involved healed themselves, a very nice job by our team 🙂 The last update in the report above, though, made us raise an eyebrow. In fact, in one account we had three EC2 instances killed in eu-west-1a, in the other we had two in eu-west-1b. So why is AWS affirming that the outage affected only one availability zone while we see at least two, we asked ourselves? No, actually we asked them, too.

And they explained, and they pointed us to the official documentation: the same availability zone may be named differently across accounts! It’s not the name that matters, as much as the availability zone ID. It’s not a secret, nor they were doing something shady: simply enough, we didn’t know!

To ensure that resources are distributed across the Availability Zones for a Region, we independently map Availability Zones to names for each AWS account. For example, the Availability Zone us-east-1a for your AWS account might not be the same location as us-east-1a for another AWS account.

To coordinate Availability Zones across accounts, you must use the AZ ID, which is a unique and consistent identifier for an Availability Zone. For example, use1-az1 is an AZ ID for the us-east-1 Region and it has the same location in every AWS account.

The following command, run for each account, showed that it was actually the case: eu-west-1a in one account and eu-west-1b in the other both mapped to the zone ID euw1-az2.:

aws ec2 describe-availability-zones \
    --region eu-west-1 \
    --query 'AvailabilityZones[*].{ZoneName:ZoneName,ZoneId:ZoneId}'

This really came as a surprise to us, and it may be a surprise for many others. Hence this post. Enjoy!

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.