What is a Deployment Pipeline?
A deployment pipeline is basically an automated process where software artifacts are promoted and verified across one or more environments (stages) to ensure they are ready for delivery to end-users (customers).
Probably one of the most common questions we get from customers is, “How should I build my deployment pipeline?”
Every business, application, or service can be different so it’s not like there is one magical pattern for building a modern pipeline. Below are some questions and considerations you should think of when designing and implementing a modern cloud-native deployment pipeline.
A great place to start is to understand the scope of your deployment pipeline.
Simply put, what software artifacts will your pipeline deploy to your end-users and customers?
This is where you get into a heated debate about whether the scope of your pipeline is an application, a single service/microservice, or many services/microservices.
Think of an application as a logical group of services or microservices that are each represented by a software artifact (e.g. AMI, Docker container, Lambda function, …)
For example, one of our customers has a single deployment pipeline for their SaaS application, one which is made up of 40 different microservices each represented by a Docker container artifact. We have another customer who has a pipeline per microservice (dev team), so patterns do vary.
Generally, if your microservices have dependencies, you typically want to manage everything through a single deployment pipeline. If your microservices are more loosely coupled, you can have many small independent pipelines, and if need be, you can always chain them together.
How you structure your deployment pipeline will often dictate how you govern, control, and manage deployments across your dev teams. You ultimately want to let developers deploy using one or more deployment pipelines; it’s, therefore, important to create appropriate scope and hierarchy so governance can be simplified.
Once you identify your scope, you then need to map out the required infrastructure (environments) that are required in order to promote and test your software artifacts. Each environment typically maps to a pipeline stage, with each stage having multiple steps (e.g. deploy, test/verify, rollback).
Most organizations typically have at least a dev, QA, and production environment to deploy/verify each software artifact. In a cloud-native world, each environment typically maps to a specific cluster (e.g. K8 cluster). It’s also not unusual for many services/microservices to share an environment as opposed to having a dedicated environment per service/microservice.
Several of our customers have multi-cloud environments–meaning, a single environment (e.g. QA) maps to many services and cloud providers. For example, a QA environment might map to microservice container A running in an AWS ECS cluster, as well as microservice B running in GCP Kubernetes cluster.
However, the QA environment itself is managed as a single entity with its own environment variables, secrets, and configuration.
The fact that the QA environment is multi-cloud and hosts multiple services is completely transparent to the team or user executing the deployment pipeline. Having all services/microservices managed under a single environment also helps with testing dependencies vs. testing services in isolation from one another.
For example, iHerb was able to migrate to the public cloud using a multi-cloud architecture of Kubernetes, AWS, Azure, and GCP.
Deployment Strategies & Workflows
Now that your pipeline scope and stages are defined, you can start to add deployment logic (workflows) so service artifacts can be deployed/promoted across environments.
For each stage in your pipeline, you need to pick a deployment strategy or pattern. This will dictate how service artifacts are deployed across the nodes in your clusters and environments.
For example, for pre-production environments with minimal business impact, customers typically perform a basic or rolling deployment strategy where all nodes in the cluster are updated at the same time or updated sequentially. This is the old/classic/vintage way of performing deployments out-of-hours 🙂
For production environments with high business impact, customers typically employ blue/green or canary patterns (below screenshot) so they can reduce/control the impact of a new deployment. Many of our customers have multi-phase canary deployments where they deploy to 5%, 10%, 25%, 50%, and 100% of their production clusters.
More and more customers are moving from Blue/Green to Canary deployments for two reasons: 1.) Reduce risk and 2.) Reduce cost/complexity. For example, instead of flipping 100% of traffic with Blue/Green, you can be more progressive with canaries and you also don’t need to maintain/manage two production clusters/environments with canary because you can scale up/down in a single production environment.
Canary deployment is really the new de-facto standard for Continuous Delivery because it allows developers and DevOps teams to deploy and test in production without significant business risk or impact.
Test Automation, Verification & Tooling
Once you’ve picked your deployment strategies the fun begins 🙂 This is where you get to test and break sh!t in your pipeline 🙂
The whole point of a deployment pipeline is to kill a new build, version, or release candidate. Why? Because you want your software artifacts to fail in pre-production versus in production with real customers.
ou also want to get to a point where your pipeline is 100% automated and you start practicing Continuous Deployment. Simply put, your pipeline dictates whether a new build or version of your app/service is ready for production deployment vs. people making that decision.
There are many types of tests, verifications, and tools to help you achieve automated testing (aka Continuous Testing, Continuous Verification). Here’s a quick breakdown of what tests are generally included in a modern deployment pipeline:
Static Code Analysis – code is examined without running the app or service to identify code quality, standard, & security optimizations. Here are some tools.
Unit Tests – code is functionally tested at the component level. E.g. JUnit tests.
Integration Tests – code is functionally tested at the service level or with other APIs. Here’s a list of tools.
OSS Security, Dependency & License Tests – code is examined for open-source licenses/vulnerabilities. WhiteSource is an example tool in this space.
Smoke Tests – these are basic tests to check core functional use cases within the app or service.
Regression Tests – these are more broad deeper tests to test the overall functionality of an app or service.
Security/Vulnerability Test – similar to static code analysis but code is inspected for security flaws. Here’s a nice list of tools.
Browser Test – code is tested for browser compatibility, normally by 3rd party services. Tools like Selenium or Sauce Labs are popular.
Load & Performance Test – code is load tested for performance and scalability, normally in a pre-production environment. Here are some tools.
Canary Verification – deployments are verified to ensure they can proceed to the next canary phase or stage. An example.
Availability Test – app or service is pinged to see if it is available to the user. It’s an easy test to perform using synthetic transactions.
Performance Test – code is instrumented in production to identify performance anomalies & regressions. Here’s some APM tools to evaluate.
Quality Test – code (logs) is checked for new errors or exceptions that might have been introduced. Splunk, Sumo Logic and ELK are popular tools.
In a cloud-native microservices world, it’s critical that you test upstream and downstream dependencies for impact. This is somewhat different from testing microservices in isolation and hoping everything will be OK.
For example, at Harness, we recently introduced a new capability called “Service Impact Analysis” that lets you test and verify microservices dependencies post-deployment by leveraging the data within your Application Performance Monitoring (APM) and Log Analytics tools.
Harness has over 65+ DevOps tool integrations so you can integrate and embed your existing test and monitoring tooling into your deployment pipelines.
Rollback & Failure Strategies
“We never rollback, we always roll forward” — this is called not having a rollback or failure strategy.
Sure, there are times when you can probably hotfix production in a heroic 10-minute hack. In the real world, you need to ensure customers are not impacted by production deployments.
Risk and downtime can certainly be reduced by leveraging a canary deployment strategy. However, a canary phase failing in production will still impact a subset of your end-users. It’s therefore important you can rollback to the last working build or version of your app/service in seconds.
Rollback is also critical in complex deployments where you might be deploying many microservices at the same time, where dependencies may exist between microservices. You, therefore, need to decide whether rollbacks are local or global across your deployment pipeline.
For example, let’s suppose we have a deployment pipeline with 3 microservices. We’ll deploy each of them using a 3-phase canary deployment workflow to 25%, 50%, and 100% of our users.
What happens if one of these microservices fails its canary phase? Do you rollback all 3 microservice deployments or just the failed microservice? This is known as local vs. global rollback.
For example, at Harness we let customers manage use cases like this thru a concept called ‘failure strategies’. When a deployment workflow fails for a service or artifact you can specify what action you want to take. In the below screenshot you can make rollback manual, fully automatic, local, global, or even tolerant of failure when pipeline tests or verifications fail.
Failure is always an option with any production deployment. The more test coverage you introduce to your pipeline in the dev & QA stages the less failure you’ll see in production but nothing is unbreakable. How you manage failure is critical in Continuous Delivery because everyone remembers production downtime and they rarely remember green flawless deployments.
For example, read how Build.com achieved production rollback in 32 seconds by leveraging a Harness canary deployment strategy.
Manual Approval Stages
You want to walk before you run with Continuous Delivery and Deployment. This means that your pipelines will begin with one or more manual approvals so that some level of human governance is still retained.
For example, it’s perfectly normal to have a manual approval controlled by a dev team lead or DevOps lead that will promote your new service build/version into production.
The vast majority of our customers have manual approval stages, either as part of their QA process or their production chance request process. For example, some organizations will still have a dedicated QA team that will perform regression tests or UAT prior to any code being approved for production. In other orgs a centralized DevOps team or IT Operations team may govern production change requests, hence any pipeline promotion would require one or more manual approvals.
It’s therefore important to map out any manual approval stages, and the key stakeholders/gatekeepers for each stage.
Once your teams are confident in the deployment pipeline process itself you can slowly remove manual approvals until you reach a point where the pipeline is fully automated from build to ship. This is known as Continuous Deployment.
Conditions For Triggering Pipelines
The last part of building your cloud-native pipeline is to define the actual condition(s) of when you want your pipeline to run.
For most organizations who want to practice Continuous Delivery, they’ll trigger pipelines based a specific time of day or they’ll trigger it manually using a UI or webhook when the time is ready. 75% of our customers do this today.
For mature organizations that practice Continuous Deployment, they’ll trigger pipelines based on a new build or version of artifact. Approx 25% of our customers do this today.
All of the above can be achieved in 1-2 hours with Harness Continuous Delivery as-a-Service. Take a free trial today!
I would be interested to understand how your team is building cloud-native pipelines in 2018? What are the top considerations/decisions you’re making when it comes to building pipes?