Blue-Green Deployment: Definition and Steps | Harness | Harness Glossary

Table of Contents

Key takeaway

Run two production-capable environments (blue live, green standby) so you can deploy and validate the new version without disrupting users.
Treat release exposure as a controlled traffic-routing change (load balancer/ingress/service switch or DNS), which minimizes downtime and makes cutover predictable.
Get a fast rollback path by switching traffic back to the previous environment—especially effective when paired with strong verification and data-safe migration practices.

Modern software delivery is headed in one direction: ship more often without betting production on every release. Blue-green deployment supports that goal by keeping two production-capable environments, blue (live) and green (standby), so you can deploy to green, verify it under production-like conditions, then switch traffic in a controlled cutover.

When you pair blue-green with automation, releases become repeatable: pre-traffic checks, health and metric validation, and a clear rollback path if signals degrade. That’s where Harness CD fits; standardizing deployment workflows with built-in verification and rollback guardrails so teams can move fast without flying blind.

What is a Blue-Green Deployment?

A blue-green deployment is a way to make changes to an application by keeping two production-capable stacks and only sending traffic to one of them at a time.

Blue-green moves risk to places where you have more control:

Verification (automated tests + real signals)
Routing (a deliberate, auditable traffic shift)
Rollback (a well-practiced switch back)

That's why blue-green is often used with automation: the more consistent your verification and promotion steps are, the more predictable your releases will be. Organizations with less mature observability will also benefit from Blue-Green due to having the spare environment available. This allows for longer “testing in production” cycles and, should a customer-impacting incident occur, near-instantaneous rollback.

How Blue-Green Deployments Work

Blue-green has three phases:

Deploy to green (without impacting users)
Verify green (prove it’s ready for real traffic)
Switch traffic (cut over from blue to green)

Let’s review the step-by-step process that production teams use.

Step 1: Keep Blue Serving Production

Your blue environment is stable and serving users. Before you touch anything, get a baseline.

Some useful baselines include:

Error rate and latency (p50/p95/p99)
Resource utilization (CPU, memory, saturation)
Key business metrics (if you have them): sign-ins, checkout success, API success rates

Baselines turn “does this look okay?” into “did we regress?” It’s a small discipline that saves time during every release.

Step 2: Deploy the New Version to Green

Deploy your new build (and any configuration changes) to the green environment. Your goal is parity.

Parity usually means:

Same infrastructure shape (or intentionally equivalent)
Same runtime configuration patterns
Same security settings and network policies
Same dependency endpoints and access

This is where consistent config management and Infrastructure as Code (IaC) really shine. If green is "almost" like blue, you'll be fixing environment drift instead of testing your app on release day.

Step 3: Verify the Green Environment

Verification is the difference between “blue-green reduces risk” and “blue-green ships outages faster.”

A practical verification set typically includes:

Smoke tests: Fast checks for critical paths (health endpoints, auth, core read/write flows).
Integration tests: Calls across services, databases, queues, storage, and external APIs.
Synthetic checks: Automated requests that mimic real user behavior.
Performance sanity checks: Not a full load test, but enough to catch obvious regressions.

As a general rule, if your verification wouldn't have caught the last few incidents you had, you shouldn't trust it until you make it better.

Step 4: Prepare for Cutover

Even a release passing tests can fail if it’s not ready to accept production traffic.

Pre-cutover tasks often include:

Get the application ready: Prime caches, initialize connection pools, reduce cold-start impact.
Readiness checks: Confirm the service can accept requests (not just “running”).
Dependency validation: Confirm green can reach the same downstream services as blue.
Feature flag posture: Decide what ships “off by default” and what’s safe to enable later.

This step is where teams remove surprises. The goal is to make cutover boring.

Step 5: Switch Traffic from Blue to Green

Traffic switching is usually done through one of these mechanisms:

Load balancer or ingress routing: Update target groups or routing rules to send requests to green.
Service discovery / routing layer: Switch which service instance is registered as active.
DNS change: Update a domain record to point to green.

In many modern environments, load balancer or ingress switching is preferred because it’s typically immediate and observable. DNS can work, but caching and propagation can create a mixed state that makes “instant cutover” and “instant rollback” less predictable.

Step 6: Monitor Closely and Keep Blue Available

After the cutover, treat the first window as a high-signal period.

Focus on:

Error rate, latency, saturation, and logs
Differences from your baseline
Any known high-risk workflows (e.g., checkout, auth, payments)

Many teams keep blue “warm” for a fixed period (30–60 minutes, a few hours, or a business day) before decommissioning. The point is not to run duplicates forever. It’s to keep a clean escape hatch until you’re confident.

Step 7: Decommission or Repurpose Blue

Once the release is stable:

Scale down the old environment, or
Repurpose it as the next green for the following release

The longer you wait to scale down or decommission your environment, the more costly the Blue-Green approach is. However, you maintain the instant rollback option longer. The better your observability and understanding of the dynamics of your application, the less costly this approach is. A common pattern is to switch colors every day: blue becomes green today, and green becomes blue tomorrow.

What You Need Before You Start

There isn't just one blue-green button. There are a few things that need to be in place for this strategy to work. Your releases will be faster and less stressful if you invest here.

Environment Parity

Your two environments should be functionally equivalent in the ways that matter:

Same network policies and access controls
Same secrets and configuration patterns
Same dependency endpoints
Same observability instrumentation

Parity does not always mean that the instance types or autoscaling settings are the same. It means that the environment acts the same way when it's in production.

Reliable Configuration Management

Blue-green often fails because configuration changes are treated like afterthoughts.

If you deploy to green but miss a critical environment variable, you might not notice until real users hit the new version.

Strong practices include:

Versioning config where possible
Using consistent config templates
Validating required variables at startup
Treating secrets as first-class deployment inputs

Strong Health Checks and Readiness Signals

A health check should let you know if an instance can handle traffic.

A lot of teams start with "/health returns 200 if the process is alive." That's a good start, but it doesn't protect cutovers.

A meaningful readiness check usually includes:

Ability to connect to critical dependencies
Ability to process a basic request path
Evidence initialization is complete (migrations, caches, config load)

Automated Testing and Verification

Blue-green makes it easier to test an environment that is similar to production, but it doesn't do the testing for you. While Blue-Green deployments are more compatible with manual testing techniques than an automated canary deployment, automated testing is better.

At minimum, you want automated checks that run every time you deploy to green. Over time, you’ll mature those checks into promotion gates based on metrics and error budgets.

A Rollback Plan That Includes Data

Rolling back traffic is easy. Rolling back data is not.

Before adopting blue-green, decide how you’ll handle:

Schema migrations
Backward compatibility
Data transformations

If your release requires a migration that the previous version can’t tolerate, switching traffic back may not restore functionality. Data safety is where most “instant rollback” stories fall apart. Database DevOps tools can help automate the schema migrations in a controlled fashion, but approaches like the Expand-Contract pattern can be appropriate for preserving instant rollback.

Benefits of a Blue-Green Deployment

When teams adopt blue-green intentionally, the benefits are practical and measurable.

Minimized Downtime

Because you set up the new environment before switching traffic, the only time you have to wait is for the routing change. Users don't see you "install" a new version; they just see the traffic change.

Reduced Deployment Risk

Blue-green separates deployment from exposure.

You can deploy the new version to green and check it without affecting users. This lowers the risk that a bad build will bring down production.

Fast Rollback

If something goes wrong right after cutover, you can usually fix it by sending traffic back to blue.

That speed is important. When problems are found early, it changes "major incident" to "short disruption." As one SRE Manager described it, “My CEO was next to me when an update brought our whole service down. He was shocked at how calm I was. Using Harness, we had everything back up in a couple of minutes. No harm done.”

Cleaner Release Validation

Because green is production-capable, your validation is more realistic than a staging environment that doesn’t match production traffic patterns, scale, or dependency behavior. Your staging environment is your production environment.

Supports Continuous Delivery

Blue-green helps teams ship smaller changes more frequently because the mechanics of releasing become repeatable.

The goal isn’t “deploy faster at all costs.” It’s “deploy predictably and recover quickly.”

Better Control Over Capacity

You can scale green correctly before exposure. You can also use automation to ensure that routing changes occur only after readiness and verification pass.

Risks and Pitfalls of a Blue-Green Deployment

On paper, blue-green is easy. These are the real-world problems that cause most problems in production.

Database Migrations and Schema Compatibility

This is the most common stumbling block.

When blue and green both talk to the same database, that database is a shared dependency. There may be a time during the cutover when both versions are still in use (for example, because some traffic stays, connections drain slowly, or you keep blue online for rollback).

That means your database changes should usually be backward and forward compatible:

Backward compatible: The new schema still works with the old application version.
Forward compatible: The old schema changes don’t break the new application version.

Patterns teams rely on:

Expand/contract migrations: Add new columns first, deploy, migrate usage, then remove old columns later.
Avoid destructive changes during cutover: Delay dropping columns or constraints until you’re confident.
Backfills as async jobs: Keep the release path fast and safe.

If your change requires a hard break (removing a required field, changing semantics, rewriting data formats), plan a multi-step release instead of a single big switch.

Stateful Traffic, Sessions, and Caches

If your application stores session state in memory, switching environments can log users out or break workflows.

Mitigations include:

Store sessions in a shared external store (e.g., Redis, database)
Use stateless auth tokens when appropriate
Warm caches so users don’t take the hit right after cutover

Also, keep an eye on sticky sessions. If a load balancer keeps users on old targets, you could end up in a mixed-version state by accident.

DNS Propagation Delays

As caches expire (TTL), DNS changes are made. This means that cutover and rollback can happen slowly instead of all at once. Some users may hit blue while others hit green, which can cause a mixed cutover.

If you need tight control, you should route at the load balancer or ingress layer.

Environment Drift

If green isn’t equivalent to blue, you'll see problems that have nothing to do with your code:

Different network rules
Missing secrets
Different autoscaling behavior
Different dependency endpoints

This is why teams often pair blue-green with IaC and immutable infrastructure practices.

Cost and Capacity

It costs more to run two production environments.

Sometimes that’s a smart trade: the extra space is an investment in reliability. If it's a problem, think about:

Keeping green smaller until verification passes
Scaling up right before cutover
Using canary deployments when full parallel capacity isn’t feasible

“Green Passed Tests” Isn’t the Same as “Green Is Safe”

Tests are necessary, but production failures can involve traffic patterns, unusual inputs, or dependency behavior.

Treat tests as a gate, not a guarantee. Pair them with strong monitoring after cutover and a clear rollback plan.

Best Practices for Safe Cutovers

To make blue-green feel like a normal part of your routine, pay attention to the times when it is risky, like when you switch traffic or verify.

Use Progressive Exposure When You Can

Classic blue-green is a full cutover. Many teams use it with progressive rollout methods:

Start with internal users
Route a small percentage of traffic to green first
Increase gradually based on signals

If you can’t do progressive exposure, compensate with stronger pre-cutover verification and clearer rollback criteria.

Define Go/No-Go Criteria Before You Deploy

Don’t invent thresholds during an incident.

Examples:

Error rate must not exceed baseline by more than X%
p95 latency must stay under Y ms
No increase in critical error codes
Business flow success rate remains stable

Automate Verification and Keep It Consistent

If verification depends on manual checks or “someone’s gut feel,” it won’t scale.

Automate:

Smoke tests
Synthetic checks
Metric-based validations
Security/compliance gates where required

Make Rollback a Practiced Action

Rollback should not be a one-time plan for saving someone.

Practice it:

Rehearse rollback in non-production
Document what triggers it
Automate traffic reversion when possible

Keep Observability in the Workflow

“Deploy succeeded” is not the same as “release succeeded.”

Build post-cutover checks into your process:

Health/readiness confirmation
Metric checks after cutover
Log and trace sampling
Alerts aligned with go/no-go criteria

Plan Data Changes as Multi-Step Releases

If your application requires schema changes:

Prefer expand/contract
Keep both versions compatible during the cutover window
Run backfills separately

Think: deploy code, shift traffic, validate, then finalize cleanup.

Blue-Green vs. Rolling vs. Canary vs. Red/Black

Different strategies lower risk in different ways. Pick the method that works best with your routing skills, level of observability, and budget.

Blue-Green vs Rolling Deployments

In a rolling deployment, instances are updated in place, with older versions gradually replaced by newer ones.

Pros: Lower infrastructure cost; no parallel environment required.
Cons: Rollback can be slower because you may have a mixed fleet; debugging can be harder when multiple versions run at once.

Choose rolling when you want simplicity and lower cost.

Choose blue-green when you want a clean separation between versions and a fast, routing-based rollback.

Blue-Green vs Canary Deployments

A canary deployment routes a small percentage of traffic to the new version first, then increases exposure if metrics look good.

Pros: Smaller blast radius; catches issues before full rollout.
Cons: Requires strong observability and traffic management.

Choose canary when you can route by percentage or segment and you’re ready to promote based on metrics.

Choose blue-green when you prefer a straightforward “validate, then switch” model and can run parallel capacity.

Red/Black vs Blue/Green

“Red/black” is often used interchangeably with blue-green. In most contexts, it describes the same pattern: one environment is live, one is idle, and traffic is switched between them.

If a source uses red/black, treat it as a naming variant unless it defines a specific implementation detail.

Blue-Green Deployment in Kubernetes

Kubernetes can make blue-green simpler because traffic routing is a first-class concept through Services and Ingress. The key is deciding what “two environments” means for your setup.

Pattern 1: Two Deployments, One Service (Selector Switch)

Two Deployments (blue and green) with different labels
One Service
Switch traffic by updating the Service selector

Pros: Simple, fast, immediate cutover.

Cons: Requires care with readiness and connection draining. Label mistakes can drop traffic.

Pattern 2: Two Services, One Ingress (Route Switch)

Blue and green each have their own Service
Ingress (or gateway) routes to the active Service

Pros: Clear separation; supports advanced routing.

Cons: More objects to manage.

Pattern 3: Two Namespaces (Environment Isolation)

Namespace “blue” and namespace “green,” each with the full stack

Pros: Strong isolation; easier parity validation.

Cons: More overhead; shared dependencies still need careful planning.

Readiness, Liveness, and Pre-Traffic Validation

In Kubernetes, probes matter:

Liveness probe: Is the process alive?
Readiness probe: Is it safe to receive traffic?

A readiness gate determines whether the Service sends traffic to a Pod. If readiness flips to true before dependencies are available, your deployment can look healthy while cutover fails.

Helpful pre-traffic steps:

Run smoke tests against the green endpoint
Warm caches and connection pools
Validate config and secret injection

Connection Draining and Graceful Termination

To avoid dropping in-flight requests:

Set terminationGracePeriodSeconds appropriately
Ensure your load balancer respects readiness and removes targets cleanly
Implement graceful shutdown in the application

Automating Blue-Green in CI/CD Pipelines

Blue-green is most effective when it’s repeatable. That’s where automation is key.

Automation reduces:

Manual steps during cutover (which increases human error)
Inconsistent verification (which increases release risk)

A strong blue-green pipeline typically includes:

1. Build and Package

Build artifacts (container images, packages)
Run unit tests and security scans
Produce a versioned, traceable output

2. Deploy & Test in QA

Deploy your build to a test environment
Run automated and manual checks
Perform Dynamic Application Security Scans
Approve for Production release

3. Deploy to Green

Provision or update green infrastructure
Deploy the application version
Apply configuration and secrets

4. Verify

Run smoke tests and integration tests
Perform automated checks on key metrics
Validate health and readiness

5. Switch Traffic

Update routing (load balancer, ingress, service selector)
Confirm traffic is flowing to green

6. Post-Deploy Monitoring and Rollback Guardrails

Observe key signals for a defined window
Roll back traffic automatically or semi-automatically if thresholds are exceeded

This is also where teams increasingly rely on intelligent signal analysis to reduce noise and spot regressions faster. You don’t need “perfect AI” to benefit, just consistent data and clear thresholds.

Automation should not hide what’s happening. It should make the workflow reliable, auditable, and observable.

How to Choose Your Traffic Switching Mechanism

Cutovers don't all work the same way. Pick the mechanism that fits how quickly you need the switch to happen and how much control you need when you rollback.

Load Balancer / Ingress Switch

Often the cleanest option:

Fast and controllable
Supports health-aware routing
Works well with connection draining

DNS Cutover

DNS can work, but understand the tradeoffs:

Propagation can be unpredictable
Caching behaviors vary across clients and resolvers
Rollback may not be immediate

If you use DNS:

Set TTL thoughtfully and validate real behavior
Monitor traffic distribution during cutover
Expect a mixed state during the transition

Service Discovery / Registry Switching

Some service meshes or discovery layers can change the active endpoints.

This can be very useful if you already have a good routing layer and good visibility. If your team is still working on that discipline, keep things simple.

Security and Compliance Considerations

Blue-green changes how you show control and traceability.

If you work in regulated environments, you should plan for:

Audit trails: Who approved the deployment and the traffic switch?
Separation of duties: Are build, deploy, and promotion roles clearly defined?
Change management: Can you link a release to a ticket, PR, or change request?
Policy enforcement: Are scan/sign/approval gates enforced consistently?

A mature deployment workflow makes releases faster because it removes uncertainty—not because it cuts corners.

‍

Blue-Green Deployments Make Releases Predictable

Blue-green deployment reduces release risk by separating “deploy” from “expose.” You validate the new version in a production-capable green environment, cut traffic over when it’s ready, and keep blue available for a fast switch-back if something breaks, especially when your data changes are designed for compatibility.

To make that workflow consistent at scale, invest in automation around verification, promotion, and rollback criteria. Harness CD helps teams standardize these steps with pipelines and deployment safeguards, so blue-green becomes a routine, measurable part of delivery.

Blue-green Deployment: Frequently Asked Questions (FAQs)

Here are quick answers to the blue-green questions teams ask most when planning (or troubleshooting) real deployments.

Is blue-green deployment truly zero downtime?

Blue-green can be near-zero downtime, but it depends on your cutover method and how your app handles state if you have long-lived connections, in-memory sessions, or heavy cache warm-up, plan for graceful handoff and validate with real traffic signals.

How long should you keep the blue environment after a blue-green cutover?

You should keep blue long enough to be confident that green is stable under real user load, then scale it down intentionally. Many teams use a fixed observation window (for example, 30–60 minutes) and extend it for higher-risk releases or business-critical periods.

Can you use blue-green deployment with database migrations?

Yes, as long as your database changes are compatible with both versions during the cutover window. Favor expand/contract patterns and avoid destructive changes until you’ve confirmed green is stable and rollback is no longer needed.

What metrics should you monitor during a blue-green deployment cutover?

Watch error rate, latency (especially p95/p99), and saturation signals (CPU, memory, queue depth) alongside key dependency health like database and cache performance. If you have them, add business metrics such as sign-in, checkout, or payment success to catch user-impacting regressions fast.

Blue-green vs canary deployment: which is better?

Blue-green is a strong fit when you want a clean before/after environment and a fast routing-based rollback. Canary is better when you can shift traffic gradually and promote based on metrics, reducing blast radius at the cost of added routing and observability complexity.

Do you need two Kubernetes clusters for a blue-green deployment?

No, many teams implement blue-green in a single Kubernetes cluster using separate Deployments, Services, or namespaces and switching traffic via Service selectors or Ingress routing. Two clusters can add isolation, but they also add operational overhead, so it’s typically a tradeoff rather than a requirement.

What is Continuous Deployment?

What is Continuous Delivery (CD)?

What is Continuous Integration? A Comprehensive Overview

Next-generation CI/CD For Dummies

Stop struggling with tools—master modern CI/CD and turn deployment headaches into smooth, automated workflows.

What is a Blue Green Deployment? | Harness Glossary

Key takeaway

What is a Blue-Green Deployment?

How Blue-Green Deployments Work

Step 1: Keep Blue Serving Production

Step 2: Deploy the New Version to Green

Step 3: Verify the Green Environment

Step 4: Prepare for Cutover

Step 5: Switch Traffic from Blue to Green

Step 6: Monitor Closely and Keep Blue Available

Step 7: Decommission or Repurpose Blue

What You Need Before You Start

Environment Parity

Reliable Configuration Management

Strong Health Checks and Readiness Signals

Automated Testing and Verification

A Rollback Plan That Includes Data

Benefits of a Blue-Green Deployment

Minimized Downtime

Reduced Deployment Risk

Fast Rollback

Cleaner Release Validation

Supports Continuous Delivery

Better Control Over Capacity

Risks and Pitfalls of a Blue-Green Deployment

Database Migrations and Schema Compatibility

Stateful Traffic, Sessions, and Caches

DNS Propagation Delays

Environment Drift

Cost and Capacity

“Green Passed Tests” Isn’t the Same as “Green Is Safe”

Best Practices for Safe Cutovers

Use Progressive Exposure When You Can

Define Go/No-Go Criteria Before You Deploy

Automate Verification and Keep It Consistent

Make Rollback a Practiced Action

Keep Observability in the Workflow

Plan Data Changes as Multi-Step Releases

Blue-Green vs. Rolling vs. Canary vs. Red/Black

Blue-Green vs Rolling Deployments

Blue-Green vs Canary Deployments

Red/Black vs Blue/Green

Blue-Green Deployment in Kubernetes

Pattern 1: Two Deployments, One Service (Selector Switch)

Pattern 2: Two Services, One Ingress (Route Switch)

Pattern 3: Two Namespaces (Environment Isolation)

Readiness, Liveness, and Pre-Traffic Validation

Connection Draining and Graceful Termination

Automating Blue-Green in CI/CD Pipelines

1. Build and Package

2. Deploy & Test in QA

3. Deploy to Green

4. Verify

5. Switch Traffic

6. Post-Deploy Monitoring and Rollback Guardrails

How to Choose Your Traffic Switching Mechanism

Load Balancer / Ingress Switch

DNS Cutover

Service Discovery / Registry Switching

Security and Compliance Considerations

‍

Blue-Green Deployments Make Releases Predictable

Blue-green Deployment: Frequently Asked Questions (FAQs)

Is blue-green deployment truly zero downtime?

How long should you keep the blue environment after a blue-green cutover?

Can you use blue-green deployment with database migrations?

What metrics should you monitor during a blue-green deployment cutover?

Blue-green vs canary deployment: which is better?

Do you need two Kubernetes clusters for a blue-green deployment?

Next-generation CI/CD For Dummies

Explore More Glossary

the State of

Engineering

Excellence 2026