February 26, 2019

Amazon ECS Blue-Green Deployments

Table of Contents

With hardware becoming more and more cost-effective, blue-green deployments are becoming common. Blue-green deployments allow customers to validate an entirely new version of an application without disturbing production traffic at all.

In blue-green deployments, two versions of the application coexist. A blue version (production version) is the actual application receiving production traffic, whereas the green Version (stage version) is the idle version that is available for testing.

New applications would always be created as a green version. Developers can test this version of the application thoroughly. Once this new version is verified and is safe to promote to production, it can become the production version almost instantaneously.  

Now, developers can choose to downsize the older version of the application or keep it around for some time. Or, if required, it can be switched back with the production version.

Benefits Of Blue-Green Deployment

  • If issues are found with the new version (green version), it can be used for troubleshooting, whereas production traffic is still uninterrupted as the original version is untouched
  • Almost zero downtime because switching between prod and stage applications is almost instantaneous. The routes are simply updated.

Great News… Harness now provides first class support for Amazon ECS blue-green deployments. To better suit customers' needs, Harness provides 2 different flavors of ECS blue-green deployment:

  1. Using Elastic Load Balancer
  2. Using Route53 DNS update

Using Elastic Load Balancer (ELB)

With Elastic Load Balancer (ELB) configured for ECS, you can have multiple versions of the service running behind the load balancer. To achieve ECS blue-green deployment, a Harness user just needs to use two different Target Groups, each with its own listener. A Target Group is a logical grouping of one or more registered instances of the ECS service.

In this topology, there are two listeners defined for Elastic Load Balancer (ELB), each listening on a different port. The Listener listening on the production port forwards traffic to the Target Group associated with the production service, while the other Listener forwards traffic to the Target group associated with stage service.

Let's See How Exactly it Works

Before Deployment Starts  
Consider the Following Scenario:
- Current Amazon ECS service version serving prod traffic is “Service-0”
- “Service-0” is configured with “Target-Group-0”
- The Listener listening on port 80, for example, is forwarding traffic to “Target-Group-0”, (that is associated with “Service-0”). So {HTTP:80} is the Production-Listener and it is forwarding traffic to the production ECS service, “Service-0"

Current system snapshot:

Current System Snapshot - Elastic Load Balancer

Create new version of the service
Now, the user wants to deploy a new version of ECS service, “Service-1”:
- User will configure another Target Group for this new service, say “Target-Group-1”
- The Listener listening on port 8080, for example, is configured to forward traffic to this target group. So this is a Stage-Listener {HTTP:8080}, and it is forwarding traffic to the stage ECS service

System snapshot after deploying new version:

So if you notice, production traffic is uninterrupted.

ECS Blue-Green Deployment - Elastic Load Balancer

Update Listener Rules
After “Service-1” is up, verified, and ready to be promoted to production, the following steps need to happen:
- The newly created service now becomes the production service (Blue version), and the current production service becomes the stage service (Green Version)
- This swap is achieved by updating the listener rules associated with ELB to forward production traffic to Target-Group-1 and stage traffic to “Target-Group-0”

System snapshot after listener rules are updated:

Listener Rules Updated


So now, any production traffic coming on {HTTP:80} of ELB is forwarded to “Target-Group-1” (a group of registered instances for “Service-1”). At this point, the user can decide whether to downsize the older service “Service-0” or keep it as is for some time.

How Easy it Looks in Harness

Harness now provides first class support for ECS blue-green deployment using elastic load balancer and takes care of switching listener rules once verification/approval is done. Harness also allows the user to perform various types of verification using Splunk, AppDynamics, New Relic, etc. Just follow these simple steps to create the ECS Blue/Green Workflow.

  1. Provide Task Definition for ECS Service.
  2. Create Harness ECS Blue-Green Workflow using ELB
  3. Configure your Workflow Setup. Here, the user can select Elastic Load Balancer, Production Listener, and Stage Listener. Harness will determine the Target Group associated with the stage-listener and use it with new ECS service being created.
  4. Harness will update listeners to switch to the production ECS service, once the new version is verified. Here is what entire workflow execution looks like:

Thus, with just a one time setup of ELB, Target groups, and Listener rules in AWS, you can completely rely on Harness to perform any future ECS Blue/Green deployments.

ECS Blue-Green Deployment With Route 53 Update

Another way to swap production traffic from an older version of a service to the newer version of the service is Route 53 DNS swap. In this architecture, both our services (blue and green) have a Service Discovery Service associated with them. This would associate the services with URLs in a hosted DNS zone that was created when the namespace of the Service Discovery Services was created.

Let’s say that the services are associated with records service_a.com (for the blue service) and service_b.com (for the green service). Furthermore, we have another record in another hosted zone called service.com. Now AS Route 53 enables the user to have CNAME records for service.com with certain weights. These weights translate to the percentage of traffic that would land on each of the services.

Initially, all of the traffic is going to the blue service. We bring up the green service and perform Continuous Verification on it.

If there are issues discovered by verification, no harm is done as our PROD traffic is not affected.

Once we are satisfied with the verification, we can start moving PROD traffic to the new version of the service. We could do it in one step or we could do it over multiple steps. Depending upon the ratio of traffic we want to send to the new version of the service, we could update the DNS records with appropriate weights. For example, we could send 10% of the traffic to the new service.

ECS Blue-Green Deployment Example 1


Eventually, we can migrate all of the PROD traffic to the green version and swap the tags on the services.

ECS Blue-Green Deployment Example  2


This can be achieved with great simplicity in Harness:

  1. Provide Task Definition for ECS Service
  2. Create Harness ECS BG workflow using DNS (Route 53)
  3. Set up your Workflow

We could have as many Change Route 53 Weights steps as we want. Let’s look at how the workflow looks in Harness.

Blue-Green With Route 53


Salient Features of Harness ECS Blue-Green Deployments

There are multiple ways of implementing blue-green deployments; however, there are certain salient features in the Harness design and implementation:

  1. No stage traffic to PROD: In certain implementations, once the deployment has been completed, the PROD (blue) service ends up handling the stage (synthetic) traffic also. This is potentially dangerous because if some error causes the stage traffic to spike or misbehave in some way, production might be affected. And this danger is not just during the deployment, but might also be present when no deployment is happening.
  2. Continuous Verification (CV) before traffic SWAP: In some cases, there is limited verification of the green service before traffic swap occurs. For example, this might be as simple as a health check validation of some target group. With Harness, we have the full power of machine learning-driven CV before we begin to swap traffic.

Satyam Shanker & Adwait A Bhandare

Continuous Delivery & GitOps