Let's cover canary and blue/green release methodologies and introduce you to Continuous Verification.
In our previous article, Continuous Delivery 101, we covered the basic concepts of what CD is. If you haven’t read it, I strongly recommend you read that first. This article is going to expand into more advanced concepts built off the first article. Specifically, I’m going to cover canary and blue/green release methodologies and introduce you to the concept of continuous verification.
In our 101 article, I discussed a concept called release strategy. A release strategy is a method of releasing software into production that reduces risk and exposure to both customers and the business.
Why this matters
When releasing software, you’re exposing yourself to risk. Specifically, you’re exposing your customers and your business. The risk comes from poorly performant software. As good as your testing and QA might be, you still run the risk of a malfunctioning deployment, security issues, and performance/quality degradation. Yep, this stuff happens. And, it happens often. So, to offset this risk, organizations today leverage release methodologies that triage their exposure to risk in how they deploy their software.
A blue/green release is replicating a production environment and switching traffic between live environments as each non-live environment receives the later update. This process then repeats with the other environment as a new version is re-released.
The router will switch traffic once the deployment is complete in the new environment.
This method can pose both higher risk and higher cost. The risk is higher because an application problem is exposed to your entire user base. The ccost is high because you have to maintain an entire replica of your production environment, quickly increasing unnecessary cost. If you’re interested in further reading, here is a good introduction by Martin Fowler.
Canary is a method of slowly releasing in small stages or batches to only a select number of users, data centers, or random nodes within your deployment before a full roll-out. Canary is perhaps the most popular for the release methods as it is both effective and cost-efficient.
A canary release involves first selecting how many test groups you want to phase your deployment across. Then, determine how large each test group will be. If the deployment into the canary group is successful, the deployment will proceed. If it fails — for whatever reason — then the deployment will not continue, and the canary group will be rolled back to the previous version of the software.
If it isn’t obvious by now, a canary methodology is brilliant in ensuring the least amount of risk for both your users and your business. In catching application issues early, you remove the risk of having a faulty version of your software deployed across your entire infrastructure — definitely a no-no.
To illustrate a Canary deployment, let’s say your application — we’ll call it version 1.1 — is deployed across 50 nodes. If you’re using Kubernetes, these could also be pods.
You want to release a new version — version 1.2 — but do not want the entire deployment to go down if the release is faulty. So, we decide to only deploy across 10% of our nodes and observe the performance and quality carefully among these five nodes, in purple:
If the performance and quality of the five nodes are acceptable, we can then deploy across the remaining 45 nodes.
Keep in mind; you can also stagger the release into multiple canary deployments across the same environment. So, let’s say you want to deploy in 4 stages:
- Stage 1: 10%
- Stage 2: 20%
- Stage 3: 50%
- Stage 4: 100%
Your deployment would then look like this:
If you’re doing this manually, or have written custom scripts to achieve this, you may want to take a look at this video. Harness allows you to perform a canary test with a few configuration options, automatically. Harness also uses unsupervised machine learning to understand and analyze the performance metrics to conduct a canary release; here is a blog demonstrating one of the models used for this.
A concept that is gaining popularity — because it is so darn critical — within mature CD conversations is a technique called Continous Verification. When software is released into an environment (e.g., stage or production), the performance and quality of that software must be tested. Usually, the performance and quality are tested using an APM and/or logging solution, such as:
- New Relic
Performing this verification step manually doesn’t scale and is limited to human error. Trying to automate this process with custom scripts is near impossible. So, modern CD solutions have solved for this critical step using Continuous Verification. CV checks the performance of the deployed service and automatically rolls back the deployment if it fails.
Here is a sequence diagram demonstrating exactly how CV works. In this case, I’m using Harness as an example to show how during the deployment phase, the service is tested using AppDynamics and Splunk. If it fails performance tests, Harness can automatically roll back the deployment.
How Harness performs Continuous Verification
Advanced Continuous Verification
Alright, verifying performance and quality is great during the release process. But, let’s be honest, shitty code can creep on us like a terrible meal: much, much later. The way that most companies hedge their risk is by investing in monitoring tools. The number of tools used by modern IT and development teams can easily reach up to 10-20 various vendors.
This raises the question: how do you monitor so many different services across so many different dependencies and environments? Consider the following diagram as a visual of the challenges that I’m referring to:
Orchestrating continuous monitoring of services among major performance monitors
How do we solve for this? Simple. Using the same algorithms for performing your CV, you turn it on to monitor around the clock. This is an extremely advanced CD technique that (pretty much) no one else on the planet can do, aside from Harness. Take a look at our 24/7 Service Guard feature. Service Guard orchestrates the monitoring of multiple services across their many dependencies. It’s a sophisticated technique of Continuous Verification and one of the most valuable capabilities that our customers love.
Eliminate After-Hour Releases
Imagine a scenario in which your team is ready to deploy a new version of your software. Using, for example, a canary release process you have an all-hands-on-deck release. Of course, this is happening during regular working hours. Or, in the least, during waking hours. Why is this a problem? It’s a problem because your team has to be available for the deployment process and you’re only able to deploy after-hours to minimize risk. This means your team is miserable. Utterly, and painfully, miserable.
By combining CD and CV, you can reach true continuous deployments and yet you still have the safety and security of deploying at any hour. You’re both automatically deploying and verifying with a safety harness (pun intended) of automatically rolling back if the deployment fails.
Between both the 101 and 102 articles, you should be well equipped to tackle your organization’s CD challenges. Equally as important, you understand from a high level the various terms and concepts that make up a modern CD platform. With Harness, these capabilities are baked into the core feature-set, allowing you to leverage the capabilities with minimal configuration. Once you’re up and running, you can build new pipelines in minutes with almost no code involved.
Keep an eye out for a third part to this series, CD 103. In the meanwhile, signup for a free trial.