Amazon Web Services – Providing Local and Geographical Redundancy
As I recently passed the AWS Certified Solutions Architect certification, I thought I’d take the opportunity to blog some of its services and the interesting and powerful ways to use them in conjunction. This blog will focus on how to provide redundancy for your applications within a single region and across geographical locations. This particular exercise was interesting to me because I have worked on similar projects in my professional career, using both on-premises and AWS infrastructure.
First and foremost, it’s imperative to understand a few AWS terms:
- Availability Zone – A single, isolated datacenter managed by Amazon.
- Region – A separate, geographical location (Virginia, California, Oregon, Tokyo, Sydney, etc)
All Regions have at least two Availability Zones – most have more than two.
Availability Zones within a Region are connected with high speed, low latency connectivity.
- EC2 Instance – a virtual machine running an operating system.
- Elastic Load Balancer (ELB) – a virtual load balancer that can be internal only or internet facing. Has the ability to health check instances to suspend or resume traffic depending on its health.
To set the stage, I created a very generic, fake website to emulate a mission critical application which requires extremely high availability. The application requires redundancy within the Primary region and have the ability to fail to a Secondary location. The website is “aws.bryankrausen.com” which may, or may not, be available if you try and hit it, depending on if I have any instances running or not (I pay for my own account). The domain was registered on GoDaddy and AWS Route 53 is hosting the authoritative DNS zone. The instances hosting the website were created using the Amazon Linux AMI and are running Apache. I created a quick network diagram of the configuration.
The first requirement that I need to satisfy is to provide Local redundancy using multiple availability zones and an ELB within the same region. When deploying any application on the AWS platform, this should be your first instinct as AWS Availability Zones do sporadically have problems and can become unavailable.
To satisfy this requirement, I manually created (4) EC2 instances using the Amazon Linux AMI and selected a different AZ for each instance as shown above (I could have easily deployed this Auto-magically using AutoDeploy but I’ll save that for another post). During the instance development, I used a bash script to ensure that the instance installed Apache, installed the latest security updates, copy the website from an AWS S3 bucket to the default Apache folder, started the Apache service, and finally ensured the service would start in the event of a reboot. (Thanks to the ACloudGuru course for introducing me to this advanced setting.)
yum install httpd -y
yum update -y
aws s3 cp s3://btkrausen/website /var/www/html/ –recursive
service httpd start
chkconfig httpd on
Now that I had (4) instances running my new application, I needed to ensure that traffic was both load balanced and configured for redundancy in the event of instance or Availability Zone failure. This is effortlessly done by creating and utilizing an Elastic Load Balancer. Like any other load balancer, the ELB provides a front-end for client connectivity while dispersing traffic to back-end servers. It also provides health check functionality to ensure that back-end servers are healthy and will systematically add/remove instances depending on its pre-defined health. Last but not least, unlike an on-premises deployment, the ELB provides a DNS name, not an IP, for front-end connectivity. In example above the DNS name for the ELB is “webserver-xxx.us.east-1.elb.amazonaws.com”.
*With this configuration, (4) separate, distinct datacenters (AZs) would have to fail before my application is unavailable to my customers.
Primary requirement fulfilled….fully redundant within a single region…but it’s not enough.
Now that redundancy and load balancing has been established in the Primary region, we need to ensure that the application is still accessible in the event the entire us-east-1 region is unavailable. The easiest way, assuming your application can handle it, is to mirror the configuration in the Primary region to a Secondary region. As shown above, I configured (2) identical instances in two separate Availability Zones within the AWS Oregon region to host my application. Like the Primary region, I also created and configured an ELB to manage both redundancy and traffic dispersement. The DNS name for this ELB is shown above as webserver-xxx.us-west-1.elb.amazonaws.com.
At this point, I have two front-end load balancers serving two farms of application servers. I could easily create a CNAME (or Alias record on Route53) to point to my ELB in my Primary region and likely never see downtime. However, in the event that the entire us-east-1 region is unavailable, it would require manual intervention (in terms of a DNS change) for the application to be accessible in the Oregon region.
Manual intervention? I thought everything in the cloud was magic and fully automated….
To automate the failure from the Primary region to the Secondary region, we can utilize Route53, AWS’ highly available and scalable DNS service, to create two Failover records for our application’s URL. The Primary Failover record will point to the ELB in us-east-1 where the Secondary Failover record will point to us-west-1. Both the Primary and Secondary Failover record can be tied to pre-defined health checks to determine if the endpoints are functioning as intended. In our case, if the health check for the Primary region fails, DNS queries will be returned with the Secondary region’s ELB and traffic will be sent to the Secondary region. Additionally, once the Primary site is available again, traffic will automatically fail back to it.
*With this configuration, (6) separate, distinct (and now regionally separated) datacenters (AZs) would have to fail before my application is unavailable to my customers.
Secondary requirement fulfilled…automated failover to second region enabled.
As you can see, it’s fairly simply to enable application availability which is unavailable by most businesses using on-premises infrastructure. For me personally, it’s even more exciting because I’ve enabled geographical application availability in my professional career utilizing on-prem infrastructure and it was so much harder. Knowing how much effort that took, the time to order, receive and rack the equipment, configure all aspects of the environment, and then do it all over again in a secondary datacenter, was incredibly hard and time consuming compared deploying on AWS. To compare, the above configuration, in its entirety, took less than 1 hour to complete further proving that cloud truly enables agility unseen in the majority of enterprise IT organizations.