.png)
We’ve come a long way in how we build and deliver software. Continuous Integration (CI) is automated, Continuous Delivery (CD) is fast, and teams can ship code quickly and often. But environments are still messy.
Shared staging systems break when too many teams deploy at once, while developers wait on infrastructure changes. Test environments get created and forgotten, but over time, what is running in the cloud stops matching what was written in code.
We have made deployments smooth and reliable, but managing environments still feels manual and unpredictable. That gap has quietly become one of the biggest slowdowns in modern software delivery.
This is the hidden bottleneck in platform engineering, and it's a challenge enterprise teams are actively working to solve.
As Steve Day, Enterprise Technology Executive at National Australia Bank, shared:
“As we’ve scaled our engineering focus, removing friction has been critical to delivering better outcomes for our customers and colleagues. Partnering with Harness has helped us give teams self-service access to environments directly within their workflow, so they can move faster and innovate safely, while still meeting the security and governance expectations of a regulated bank.”
At Harness, Environment Management is a first-class capability inside our Internal Developer Portal. It transforms environments from manual, ticket-driven assets into governed, automated systems that are fully integrated with Harness Continuous Delivery and Infrastructure as Code Management (IaCM).

This is not another self-service workflow. It is environment lifecycle management built directly into the delivery platform.
The result is faster delivery, stronger governance, and lower operational overhead without forcing teams to choose between speed and control.
Continuous Delivery answers how code gets deployed. Infrastructure as Code defines what infrastructure should look like. But the lifecycle of environments has often lived between the two.

Teams stitch together Terraform projects, custom scripts, ticket queues, and informal processes just to create and update environments. Day two operations such as resizing infrastructure, adding services, or modifying dependencies require manual coordination. Ephemeral environments multiply without cleanup. Drift accumulates unnoticed.
The outcome is familiar: slower innovation, rising cloud spend, and increased operational risk.
Environment Management closes this gap by making environments real entities within the Harness platform. Provisioning, deployment, governance, and visibility now operate within a single control plane.
Harness is the only platform that unifies environment lifecycle management, infrastructure provisioning, and application delivery under one governed system.
At the center of Environment Management are Environment Blueprints.
Platform teams define reusable, standardized templates that describe exactly what an environment contains. A blueprint includes infrastructure resources, application services, dependencies, and configurable inputs such as versions or replica counts. Role-based access control and versioning are embedded directly into the definition.

Developers consume these blueprints from the Internal Developer Portal and create production-like environments in minutes. No tickets. No manual stitching between infrastructure and pipelines. No bypassing governance to move faster.
Consistency becomes the default. Governance is built in from the start.
Environment Management handles more than initial provisioning.
Infrastructure is provisioned through Harness IaCM. Services are deployed through Harness CD. Updates, modifications, and teardown actions are versioned, auditable, and governed within the same system.
Teams can define time-to-live policies for ephemeral environments so they are automatically destroyed when no longer needed. This reduces environment sprawl and controls cloud costs without slowing experimentation.
Harness EM also introduces drift detection. As environments evolve, unintended changes can occur outside declared infrastructure definitions. Drift detection provides visibility into differences between the blueprint and the running environment, allowing teams to detect issues early and respond appropriately. In regulated industries, this visibility is essential for auditability and compliance.

For enterprises operating at scale, self-service without control is not viable.
Environment Management leverages Harness’s existing project and organization hierarchy, role-based access control, and policy framework. Platform teams can control who creates environments, which blueprints are available to which teams, and what approvals are required for changes. Every lifecycle action is captured in an audit trail.
This balance between autonomy and oversight is critical. Environment Management delivers that balance. Developers gain speed and independence, while enterprises maintain the governance they require.
"Our goal is to make environment creation a simple, single action for developers so they don't have to worry about underlying parameters or pipelines. By moving away from spinning up individual services and using standardized blueprints to orchestrate complete, production-like environments, we remove significant manual effort while ensuring teams only have control over the environments they own."
— Dinesh Lakkaraju, Senior Principal Software Engineer, Boomi
Environment Management represents a shift in how internal developer platforms are built.
Instead of focusing solely on discoverability or one-off self-service actions, it brings lifecycle control, cost governance, and compliance directly into the developer workflow.
Developers can create environments confidently. Platform engineers can encode standards once and reuse them everywhere. Engineering leaders gain visibility into cost, drift, and deployment velocity across the organization.
Environment sprawl and ticket-driven provisioning do not have to be the norm. With Environment Management, environments become governed systems, not manual processes. And with CD, IaCM, and IDP working together, Harness is turning environment control into a core platform capability instead of an afterthought.
This is what real environment management should look like.

Engineering teams are generating more shippable code than ever before — and today, Harness is shipping five new capabilities designed to help teams release confidently. AI coding assistants lowered the barrier to writing software, and the volume of changes moving through delivery pipelines has grown accordingly. But the release process itself hasn't kept pace.
The evidence shows up in the data. In our 2026 State of DevOps Modernization Report, we surveyed 700 engineering teams about what AI-assisted development is actually doing to their delivery. The finding stands out: while 35% of the most active AI coding users are already releasing daily or more, those same teams have the highest rate of deployments needing remediation (22%) and the longest MTTR at 7.6 hours.
This is the velocity paradox: the faster teams can write code, the more pressure accumulates at the release, where the process hasn't changed nearly as much as the tooling that feeds it.
The AI Delivery Gap
What changed is well understood. For years, the bottleneck in software delivery was writing code. Developers couldn't produce changes fast enough to stress the release process. AI coding assistants changed that. Teams are now generating more change across more services, more frequently than before — but the tools for releasing that change are largely the same.
In the past, DevSecOps vendors built entire separate products to coordinate multi-team, multi-service releases. That made sense when CD pipelines were simpler. It doesn't make sense now. At AI speed, a separate tool means another context switch, another approval flow, and another human-in-the-loop at exactly the moment you need the system to move on its own.
The tools that help developers write code faster have created a delivery gap that only widens as adoption grows.
Today Harness is releasing five capabilities, all natively integrated into Continuous Delivery. Together, they cover the full arc of a modern release: coordinating changes across teams and services, verifying health in real time, managing schema changes alongside code, and progressively controlling feature exposure.
Release Orchestration replaces Slack threads, spreadsheets, and war-room calls that still coordinate most multi-team releases. Services and the teams supporting them move through shared orchestration logic with the same controls, gates, and sequence, so a release behaves like a system rather than a series of handoffs. And everything is seamlessly integrated with Harness Continuous Delivery, rather than in a separate tool.
AI-Powered Verification and Rollback connects to your existing observability stack, automatically identifies which signals matter for each release, and determines in real time whether a rollout should proceed, pause, or roll back. Most teams have rollback capability in theory. In practice it's an emergency procedure, not a routine one. Ancestry.com made it routine and saw a 50% reduction in overall production outages, with deployment-related incidents dropping significantly.
Database DevOps, now with Snowflake support, brings schema changes into the same pipeline as application code, so the two move together through the same controls with the same auditability. If a rollback is needed, the application and database schema can rollback together seamlessly. This matters especially for teams building AI applications on warehouse data, where schema changes are increasingly frequent and consequential.
Improved pipeline and policy support for feature flags and experimentation enables teams to deploy safely, and release progressively to the right users even though the number of releases is increasing due to AI-generated code. They can quickly measure impact on technical and business metrics, and stop or roll back when results are off track. All of this within a familiar Harness user interface they are already using for CI/CD.
Warehouse-Native Feature Management and Experimentation lets teams test features and measure business impact directly with data warehouses like Snowflake and Redshift, without ETL pipelines or shadow infrastructure. This way they can keep PII and behavioral data inside governed environments for compliance and security.
These aren't five separate features. They're one answer to one question: can we safely keep going at AI speed?
Traditional CD pipelines treat deployment as the finish line. The model Harness is building around treats it as one step in a longer sequence: application and database changes move through orchestrated pipelines together, verification checks real-time signals before a rollout continues, features are exposed progressively, and experiments measure actual business outcomes against governed data.
A release isn't complete when the pipeline finishes. It's complete when the system has confirmed the change is healthy, the exposure is intentional, and the outcome is understood.
That shift from deployment to verified outcome is what Harness customers say they need most. "AI has made it much easier to generate change, but that doesn't mean organizations are automatically better at releasing it," said Marc Pearce, Head of DevOps at Intelliflo. "Capabilities like these are exactly what teams need right now. The more you can standardize and automate that release motion, the more confidently you can scale."
The real shift here is operational. The work of coordinating a release today depends heavily on human judgment, informal communication, and organizational heroics. That worked when the volume of change was lower. As AI development accelerates, it's becoming the bottleneck.
The release process needs to become more standardized, more repeatable, and less dependent on any individual's ability to hold it together at the moment of deployment. Automation doesn't just make releases faster. It makes them more consistent, and consistency is what makes scaling safe.
For Ancestry.com, implementing Harness helped them achieve 99.9% uptime by cutting outages in half while accelerating deployment velocity threefold.
At Speedway Motors, progressive delivery and 20-second rollbacks enabled a move from biweekly releases to multiple deployments per day, with enough confidence to run five to 10 feature experiments per sprint.
AI made writing code cheap. Releasing that code safely, at scale, is still the hard part.
Harness Release Orchestration, AI-Powered Verification and Rollback, Database DevOps, Warehouse-Native Feature Management and Experimentation, and Improve Pipeline and Policy support for FME are available now. Learn more and book a demo.

On March 19th, the risks of running open execution pipelines — where what code runs in your CI/CD environment is largely uncontrolled — went from theoretical to catastrophic.
A threat actor known as TeamPCP compromised the GitHub Actions supply chain at a scale we haven't seen before (tracked as CVE-2026-33634, CVSS 9.4). They compromised Trivy, the most widely used vulnerability scanner in the cloud-native ecosystem, and turned it into a credential-harvesting tool that ran inside victims' own pipelines.
Between March 19 and March 24, 2026, organizations running affected tag-based GitHub Actions references were sending their AWS tokens, SSH keys, and Kubernetes secrets directly to the attacker. SANS Institute estimates over 10,000 CI/CD workflows were directly affected. According to multiple security research firms, the downstream exposure extends to tens of thousands of repositories and hundreds of thousands of accounts.
Five ecosystems. Five days. One stolen Personal Access Token.
This is a fundamental failure of the open execution pipeline model — where what runs in your pipeline is determined by external references to public repositories, mutable version tags, and third-party code that executes with full privileges. GitHub Actions is the most prominent implementation.
The alternative, governed execution pipelines, where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references, is the model we designed Harness around years ago, precisely because we saw this class of attack coming.
TeamPCP wasn't an anomaly; it was the inevitable conclusion of an eighteen-month escalation in CI/CD attack tactics.
CVE-2025-30066. Attackers compromised a PAT from an upstream dependency (reviewdog/action-setup) and force-pushed malicious code to every single version tag of tj-actions/changed-files. 23,000 repositories were exposed. The attack was later connected to a targeted campaign against Coinbase. CISA issued a formal advisory.
This proved that the industry's reliance on mutable tags (like @v2) was a serious structural vulnerability. According to Wiz, only 3.9% of repositories pin to immutable SHAs. The other 96% are trusting whoever owns the tag today.
The first self-replicating worm in the CI/CD ecosystem. Shai-Hulud 2.0 backdoored 796 npm packages representing over 20 million weekly downloads — including packages from Zapier, PostHog, and Postman.
It used TruffleHog to harvest 800+ credential types, registered compromised machines as self-hosted GitHub runners named SHA1HULUD for persistent C2 over github.com, and built a distributed token-sharing network where compromised machines could replace each other's expired credentials.
PostHog's candid post-mortem revealed that attackers stole their GitHub bot's PAT via a pull_request_target workflow exploit, then used it to steal npm publishing tokens from CI runner secrets. Their admission that this kind of attack "simply wasn't something we'd prepared for" reflects the industry-wide gap between application security and CI/CD security maturity. CISA issued another formal advisory.
TeamPCP went after the security tools themselves.
They exploited a misconfigured GitHub Actions workflow to steal a PAT from Aqua Security's aqua-bot service account. Aqua detected the breach and initiated credential rotation — but reporting suggests the rotation did not fully cut off attacker access. TeamPCP appears to have retained or regained access to Trivy's release infrastructure, enabling the March 19 attack weeks after initial detection.
On March 19, they force-pushed a malicious "Cloud Stealer" to 76 of 77 version tags in trivy-action and all 7 tags in setup-trivy. Simultaneously, they published an infected Trivy binary (v0.69.4) to GitHub Releases and Docker Hub. Every pipeline referencing those tags by name started executing the attacker's code on its next run. No visible change to the release page. No notification. No diff to review.
TeamPCP's payload was purpose-built for CI/CD runner environments:
Memory Scraping. It read /proc/*/mem to extract decrypted secrets held in RAM. GitHub's log-masking can't hide what's in process memory.
Cloud Metadata Harvesting. It queried the AWS Instance Metadata Service (IMDS) at 169.254.169.254, pivoting from "build job" to full IAM role access in the cloud.
Filesystem Sweep. It searched over 50 specific paths — .env files, .aws/credentials, .kube/config, SSH keys, GPG keys, Docker configs, database connection strings, and cryptocurrency wallet keys.
Encrypted Exfiltration. All data was bundled into tpcp.tar.gz, encrypted with AES-256 and RSA-4096, and sent to typosquatted domains like scan.aquasecurtiy[.]org (note the "tiy"). These domains returned clean verdicts from threat intelligence feeds during the attack. As a fallback, the stealer created public GitHub repos named tpcp-docs under the victim's own account.
The malicious payload executed before the legitimate Trivy scan. Pipelines appeared to work normally. CrowdStrike noted: "To an operator reviewing workflow logs, the step appears to have completed successfully."
Sysdig observed that the vendor-specific typosquat domains were a deliberate deception — an analyst reviewing CI/CD logs would see traffic to what appears to be the vendor's own domain.
It took Aqua five days to fully evict the attacker, during which TeamPCP pushed additional malicious Docker images (v0.69.5 and v0.69.6).
Why did this work so well? Because GitHub Actions is the leading example of an open execution pipeline — where what code runs in your pipeline is determined by external references that anyone can modify.
This trust problem isn't new. Jenkins had a similar issue with plugins. Third-party code ran with full process privileges. But Jenkins ran inside your firewall; exfiltrating data required getting past your network perimeter.
GitHub Actions took the same open execution approach but moved execution to cloud-hosted runners with broad internet egress, making exfiltration trivially easy. TeamPCP's Cloud Stealer just needed to make an HTTPS POST to an external domain, which runners are designed to do freely.
Here are a few reasons why open execution pipelines break at scale:
Mutable Trust. When you use @v2, you are trusting a pointer, not a piece of code. Tags can be silently redirected by anyone with write access. TeamPCP rewrote 76 tags in a single operation. 96% of the ecosystem is exposed.
Flat Privileges. Third-party Actions run with the same permissions as your code. No sandbox. No permission isolation. This is why TeamPCP targeted security scanners — tools that by design have elevated access to your pipeline infrastructure. The attacker doesn't need to break in. The workflow invites them in.
Secret Sprawl. Secrets are typically injected into the runner's environment or process memory during job execution, where they remain accessible for the job's duration. TeamPCP's /proc/*/mem scraper didn't need any special privilege. It just needed to be running on the same machine.
Unbounded Credential Cascades. There is no architectural boundary that stops a credential stolen in one context from unlocking another. TeamPCP proved this definitively: Trivy → Checkmarx → LiteLLM → AI API keys across thousands of enterprises. One PAT, five ecosystems.
Harness CI/CD pipelines are built as governed execution pipelines — where what runs is controlled through customer-owned infrastructure, policy gates, scoped credentials, immutable references, and explicit trust boundaries. At its core is the Delegate — a lightweight worker process that runs inside your infrastructure (your VPC, your Kubernetes cluster), executes tasks locally, and communicates with the Harness control plane via outbound-only connections.
When we designed this architecture, we assumed the execution plane would become the primary target in the enterprise. If TeamPCP tried to attack a Harness-powered environment, they would hit three architectural walls.
The Architecture.
The Delegate lives inside your VPC or cluster. It communicates with our SaaS control plane via outbound-only HTTPS/WSS. No inbound ports are opened.
The Defense.
You control the firewall. Allowlist app.harness.io and the specific endpoints your pipelines need, deny everything else. TeamPCP's exfiltration to typosquat domains would fail at the network layer — not because of a detection rule, but because the path doesn't exist. Both typosquat domains returned clean verdicts from threat intel feeds. Egress filtering by allowlist is more reliable than detection by reputation.
The Architecture.
Rather than bulk-injecting secrets as flat environment variables at job start, Harness can resolve secrets at runtime through your secret manager — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault — via the Delegate, inside your network. Harness SaaS stores encrypted references and metadata, not plaintext secret values.
The Defense.
TeamPCP's Cloud Stealer worked because in an open execution pipeline, secrets are typically injected into the runner's process memory where they remain accessible for the job's duration. In a governed execution pipeline, this exposure is structurally reduced: secrets can be resolved from your controlled vault at the point they're needed, rather than broadcast as environment variables to every step in the pipeline.
An important caveat: Vault-based resolution alone doesn't eliminate runtime exfiltration. Once a secret is resolved and passed to a step that legitimately needs it — say, an npm token during npm publish — that secret exists in the step's runtime. If malicious code is executing in that same context (for example, a tampered package.json that exfiltrates credentials during npm run test), the secret is exposed regardless of where it came from. This is why the three walls work as a system: Wall 2 reduces the surface of secret exposure, Wall 1 blocks the exfiltration path, and (as we'll see) Wall 3 limits the blast radius to the scoped environment. No single wall is sufficient on its own.
To further strengthen how pipelines use secrets, leverage ephemeral credentials — AWS STS temporary tokens, Vault dynamic secrets, or GCP short-lived service account tokens — that auto-expire after a defined window, often minutes. Even if TeamPCP’s memory scraper extracted an ephemeral credential, it likely would have expired before the attacker could pivot to the next target.
The Architecture.
Harness supports environment-scoped delegates as a core architecture pattern. Your "Dev" scanner delegate runs in a different cluster, with different network boundaries and different credentials, than your "Prod" deployment delegate.
The Defense.
The credential cascade that defined TeamPCP hits a dead end. Stolen Dev credentials cannot reach Production publishing gates or AI API keys, because those credentials live in a different vault, resolved by a different delegate, in a different network segment. If the Trivy compromise only yielded credentials scoped to a dev environment, the attack stops at phase one.
Beyond the walls, governed execution pipelines provide additional structural controls:
Architecture is a foundation, not a guarantee. Governed execution pipelines are materially safer against this class of attack, but you can still create avoidable risk by running unvetted containers on delegates, skipping egress filtering, using the same delegate across dev and prod, granting overly broad cloud access, or exposing excessive secrets to jobs that don't need them, or using long-lived static credentials when ephemeral alternatives exist.
I am not claiming that Harness is safe and GitHub Actions is unsafe. That would be too simplistic.
What I am claiming is that governed execution pipelines — where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references — are a materially safer foundation than open execution pipelines. We designed Harness as our implementation of a governed execution pipeline. But architecture is a starting point — you still have to operate it well.
As we enter the era of Agentic AI — where AI is generating pipelines, suggesting dependencies, and submitting pull requests at machine speed — we can no longer rely on human review to catch a malicious tag in an AI-generated PR.
But there's a more fundamental shift: AI agents will become the primary actors inside CI/CD pipelines. Not just generating code — autonomously executing tasks, selecting dependencies, making deployment decisions, remediating incidents.
Now imagine an AI agent in an open execution pipeline — downloaded from a public marketplace, referenced by a mutable tag, executing with full privileges, making dynamic runtime decisions you didn't define. It has access to your secrets, your cloud credentials, and your deployment infrastructure. Unlike a static script, an agent makes decisions at runtime — fetching resources, calling APIs, modifying files.
If TeamPCP showed us what happens when a static scanner is compromised, imagine what happens when an autonomous AI agent is compromised — or simply makes a decision you didn't anticipate.
This is why governed execution pipelines aren't just a security improvement — they're an architectural prerequisite for the AI era. In a governed pipeline, even an AI agent operates within structural boundaries: it runs on infrastructure you control, accesses only scoped secrets, has restricted egress, and its actions are audited. The agent may be autonomous, but the pipeline constrains what it can reach.
The questions every engineering leader should be asking:
If you use Trivy, Checkmarx, or LiteLLM:
If you use GitHub Actions:
For the longer term:
I'm writing this as the CEO of a company that competes with GitHub in the CI/CD space. I want to be transparent about that.
But I'm also writing this as someone who has spent two decades building infrastructure software and who saw this threat model coming. When we designed Harness, the open execution pipeline model had already evolved from Jenkins plugins to GitHub Actions — each generation making it easier for third-party code to run with full privileges and, by moving execution further from the customer's network perimeter, making exfiltration easier. We deliberately chose to build governed execution pipelines instead.
The TeamPCP campaign didn't teach us anything new about the risk. What it did was make the difference between open and governed execution impossible for the rest of the industry to ignore.
Open source security tools are invaluable. The developers and companies who build them — including Aqua Security and Checkmarx — are doing essential work. The problem isn't the tools. The problem is running them inside open execution pipelines where third-party code has full privileges, secrets sit in memory, and exfiltration faces no structural barrier.
If you want to explore how the delegate architecture works in practice, we're here to show you. But more importantly, regardless of what platform you choose, please take these structural questions seriously. The next TeamPCP is already studying the credential graph.


“We’ve been operating in a hybrid environment with both OpenTofu and Terragrunt, and Harness has made it much easier to bring those workflows together into a single, consistent platform with IaCM. The addition of Terragrunt support is a valuable step toward simplifying how we manage infrastructure at scale.”
— Lead Platform Engineer, Enterprise Customer
Infrastructure as Code is now a standard for modern cloud operations, with most enterprises using IaC to provision and manage environments. However, as adoption grows, so does complexity. Teams are no longer managing a handful of environments. They are operating across multiple regions, accounts, and services, often at massive scale.
This is where traditional approaches begin to fall short.
As organizations scale their infrastructure, Terraform alone is often not enough. Teams adopt Terragrunt to manage complex, multi-environment deployments, but they are often forced to stitch together fragmented tooling that lacks visibility, governance, and consistency.
At Harness, we are changing that.
Today, we are excited to announce native Terragrunt support in Harness IaCM, bringing it to full parity with Terraform and OpenTofu while delivering capabilities that go beyond what is available in standalone tooling. This is more than support. It is about making Terragrunt a first-class platform for enterprise infrastructure management.
With Harness IaCM, teams can now:

Terragrunt has become a critical layer for managing infrastructure at scale because it simplifies how teams structure and reuse configurations across environments. Harness builds on that foundation with deep, native integration, enabling platform teams to operate with both flexibility and control.
This is especially important for enterprises where a single deployment spans multiple environments and services. Harness abstracts that complexity while maintaining governance, auditability, and consistency.
Terragrunt is part of a broader shift toward multi-tool infrastructure strategies.
Modern teams are no longer standardized on a single IaC tool. Instead, they operate across:

This creates challenges around consistency, visibility, and governance. Harness IaCM is built for this reality. We are evolving IaCM into a unified control plane for multi-IaC workflows, where teams can manage different frameworks with a consistent experience, shared policies, and centralized visibility.
This means:
Instead of managing infrastructure in silos, teams can now operate from a single platform across the entire lifecycle.
The next phase of Infrastructure as Code is not just about supporting more tools. It is about making infrastructure systems more intelligent and automated.
We are investing in two key areas:
We are continuing to support modern frameworks like AWS CDK, enabling developer-centric infrastructure workflows alongside provisioning, configuration, and orchestration tools.
We are introducing intelligence into IaC workflows to simplify tasks such as drift management and optimization. This helps teams reduce manual effort and operate more efficiently at scale.
Together, these investments move IaCM toward a unified, multi-IaC platform that combines flexibility, governance, and automation. Terragrunt has become essential for managing infrastructure at scale but until now, it hasn’t had a platform that truly supports it. As infrastructure continues to grow in complexity, our focus remains the same. Helping teams move faster, reduce risk, and scale with confidence no matter which IaC tools they use.


The release of Anthropic Mythos and Project Glasswing marks an exciting and pivotal new chapter in software development. As the industry advances, the speed and economics of vulnerability exploitation have fundamentally shifted. What once took weeks of manual reconnaissance can now be scaled rapidly through automated models. However, this is not just a security problem to solve. It is a massive engineering opportunity to build cleaner, more robust systems. By leaning into AI-accelerated defense, engineering teams are uniquely positioned to lead the charge and redesign the landscape of modern software architecture.
To succeed in this new era, the traditional silos separating security and engineering must fall. Defense at machine speed requires a unified front.
The foundation of AI-accelerated defense relies on sound, proactive engineering practices. Developers must take ownership of architectural hygiene from the ground up.
Even with the best architecture, unexpected friction will occur. Resilient engineering means planning comprehensively for your ecosystem.
To keep pace with the increased velocity of engineering teams, Security teams must also evolve their operational models.
Engineering leaders and developers are in the perfect position to navigate this industry inflection point. By taking ownership of these structural changes today, you ensure the long-term viability of your products and the enduring strength of your codebase. Bring your security, infrastructure, and engineering teams together into the same room and start building your shared roadmap today.


What happens when your Infrastructure as Code management strategy works perfectly in dev, scales reasonably well in staging, and then quietly fractures across seventeen production workspaces because nobody documented which Terragrunt wrapper goes with which AWS account? You spend Friday afternoon reverse-engineering DRY patterns that made sense six months ago, wondering why your team is managing three different IaC execution engines with four incompatible workflow philosophies.
This scenario isn't hypothetical. It's the reality of organizations that adopted IaC incrementally, layer by layer, without a unified management approach. One team standardized on OpenTofu for new infrastructure. Another maintained legacy Terraform configurations because migration felt risky. A third discovered Terragrunt and used it to wrangle complexity across AWS regions, but now those wrappers exist outside any centralized governance model. Each decision was rational in isolation. Together, they created an orchestration problem masquerading as a tooling problem.
The actual challenge isn't choosing between Terraform, OpenTofu, or Terragrunt. It's managing their outputs, enforcing policy consistently across execution contexts, and ensuring that infrastructure changes don't outpace your ability to understand what's deployed.
Most platform teams don't set out to run multiple IaC tools simultaneously. They inherit Terraform state from acquisitions, adopt OpenTofu for licensing predictability, and introduce Terragrunt because someone needed to stop copying backend configurations across 40 AWS accounts. The tools themselves aren't the problem. The problem is that each tool introduces its own state management assumptions, module resolution logic, and workflow expectations.
Terragrunt, for instance, exists specifically to solve Terraform's verbosity problem. It lets you define backend configurations once and reference them across environments. It supports dependency graphs so you can deploy a VPC before attempting to create subnets. These capabilities are valuable, but they also mean your actual infrastructure logic now spans two layers: the Terraform or OpenTofu code that defines resources, and the Terragrunt configuration that orchestrates execution.
When you lack centralized Infrastructure as Code management, those layers drift independently. Someone updates a Terragrunt dependency graph without realizing it breaks a downstream workspace. Another engineer modifies an OpenTofu module but forgets that three different Terragrunt configurations depend on its output structure. You don't discover these issues until a deployment fails in production, and the postmortem reveals that nobody had visibility into the full dependency chain.
The typical response to multi-IaC complexity is to standardize on one tool and deprecate the others. That works if you're early in your IaC journey. It's impractical if you're managing hundreds of workspaces across regulated environments where compliance audits expect immutable infrastructure definitions and audit trails for every state change.
Here's what actually happens: platform teams create custom CI/CD pipelines for each tool. Terraform runs in Jenkins. OpenTofu runs in GitHub Actions. Terragrunt configurations use a shell script someone wrote during an incident. Each pipeline implements drift detection differently. Policy enforcement exists as scattered OPA rules that don't share a common evaluation context. When an auditor asks, "How do you prevent unapproved infrastructure changes?", the honest answer is, "We run some checks in some places, and we hope teams remember to use them."
This isn't negligence. It's what emerges when Infrastructure as Code management tooling doesn't natively support the reality of polyglot IaC environments. Teams need a system that treats OpenTofu, Terraform, and Terragrunt as execution details, not architectural boundaries. The workflow layer—plan generation, policy evaluation, approval gates, state locking—should remain consistent regardless of which engine interprets the configuration.
Running `terragrunt apply` successfully doesn't mean your infrastructure is well-managed. It means Terragrunt successfully invoked OpenTofu or Terraform and applied a configuration. The actual management work—validating inputs, enforcing cost policies, detecting drift, promoting changes through environments—exists outside the execution layer.
This is where most homegrown solutions collapse under their own weight. You build a wrapper script that runs Terragrunt with the right flags. Then you add pre-commit hooks for policy checks. Then you integrate Sentinel or OPA, but only for workspaces that someone remembered to configure. Then you add Slack notifications so people know when drift occurs, but the notifications don't include enough context to act on them. Eventually, you have a Rube Goldberg machine that works until it doesn't, and debugging requires institutional knowledge that exists in one person's head.
The fundamental issue is that IaC workflow optimization requires thinking beyond execution engines. You need orchestration that understands module dependencies, workspace relationships, and policy boundaries. You need variable management that doesn't require copying YAML files between repositories. You need drift detection that runs automatically and surfaces meaningful deltas, not raw Terraform output dumped into a log file.
Treating Terragrunt as an afterthought—something teams bolt onto existing Terraform or OpenTofu pipelines—misses its architectural intent. Terragrunt exists because managing backend configurations, passing outputs between modules, and orchestrating multi-account deployments shouldn't require copying boilerplate across dozens of directories. When Infrastructure as Code management platforms support Terragrunt natively, they acknowledge this reality: the DRY principle applies to infrastructure orchestration, not just resource definitions.
Native Terragrunt support means the platform understands dependency graphs without requiring custom parsing logic. It means workspace templates can reference Terragrunt configurations directly, rather than forcing teams to flatten everything into monolithic Terraform modules. It means policy enforcement applies before Terragrunt invokes the underlying execution engine, catching invalid configurations before they generate failed plans.
This matters most in organizations running multi-region or multi-cloud architectures. A typical pattern: one Terragrunt configuration defines networking across AWS regions, another manages Kubernetes clusters, a third provisions databases. Each configuration depends on outputs from the others. Without native orchestration, teams either write brittle shell scripts to sequence these dependencies or accept that deployments sometimes fail halfway through because someone applied changes out of order.
The real test of an Infrastructure as Code management platform isn't whether it runs OpenTofu or Terraform. It's whether it provides consistent state visibility, policy enforcement, and audit trails across both. If your platform requires separate workflows for each execution engine, you've automated the mechanics but not the governance.
Consider policy evaluation. A reasonable security requirement: no S3 buckets should allow public read access. With fragmented tooling, you implement this rule multiple times. Once for Terraform workspaces using Sentinel. Again for OpenTofu configurations using OPA. A third time for Terragrunt-managed infrastructure, where you're not sure which policy engine applies because Terragrunt is just orchestrating calls to Terraform or OpenTofu. When an audit occurs, you can't prove consistent enforcement because there's no unified policy evaluation layer.
The same fragmentation affects drift detection. Terraform Cloud detects drift for Terraform-managed resources. Your OpenTofu workspaces might run scheduled reconciliation jobs, or they might not—it depends on whether someone configured them. Terragrunt configurations drift silently unless you've built custom tooling to periodically run `terragrunt plan` and parse the output. The result: partial visibility across your infrastructure estate, where "managed by IaC" becomes aspirational rather than descriptive.
Organizations exploring Terraform alternatives often focus on licensing or community governance. Those considerations matter, but they don't address the operational question: how do you manage infrastructure deployed with multiple execution engines without creating parallel workflow systems?
OpenTofu integration means more than "we can run OpenTofu commands." It means workspaces provisioned for OpenTofu behave identically to Terraform workspaces at the orchestration layer. Variable sets apply consistently. Policy evaluation uses the same rule sets. Drift detection runs on the same schedule. Approval workflows follow the same governance model. The execution engine becomes an implementation detail, not a workflow boundary.
This distinction matters during migrations. Teams don't flip entire infrastructure estates from Terraform to OpenTofu overnight. They migrate incrementally, starting with non-critical workspaces and expanding as confidence grows. If your Infrastructure as Code management platform treats each engine as a separate silo, you're managing two parallel systems during the transition. If the platform abstracts execution details behind a unified orchestration layer, the migration becomes a configuration change, not an architectural overhaul.
The hard problems in infrastructure management aren't technical; they're organizational. How do you ensure that 40 engineers across six teams follow the same approval process for production changes? How do you enforce cost policies without blocking legitimate deployments? How do you maintain audit trails that satisfy compliance requirements without turning every infrastructure change into a bureaucratic ordeal?
IaC orchestration platforms solve these problems by decoupling policy from execution. Instead of embedding governance rules in CI/CD pipelines—where they're invisible, untestable, and easy to bypass—you define them once at the platform level. Instead of writing custom scripts to sequence Terragrunt dependencies, you describe the dependency graph declaratively and let the platform handle execution order. Instead of building bespoke drift detection logic, you configure detection schedules and let the platform surface meaningful deltas.
This approach doesn't eliminate complexity. It consolidates complexity into a layer designed to manage it. Your IaC configurations remain simple: modules that define resources, Terragrunt wrappers that eliminate boilerplate, workspace configurations that specify execution context. The orchestration platform handles everything else: state locking, policy evaluation, approval workflows, audit logging, drift remediation.
Harness Infrastructure as Code Management approaches these challenges by treating the execution engine as a deployment detail, not an architectural constraint. Whether you're running OpenTofu, Terraform, or Terragrunt, the orchestration layer remains consistent: standardized pipelines for plan generation and apply operations, unified policy enforcement across all workspaces, centralized drift detection that surfaces actionable insights.
For teams managing infrastructure across multiple clouds, regions, or execution engines, Harness IaCM provides the orchestration layer that makes polyglot IaC environments manageable. The platform doesn't force you to standardize on a single tool. It provides governance, visibility, and workflow consistency regardless of which engine interprets your configurations.
The promise of Infrastructure as Code—reproducible deployments, version-controlled infrastructure, collaborative development—only materializes when you have consistent orchestration across execution engines. Running Terraform in one pipeline, OpenTofu in another, and Terragrunt through a shell script doesn't scale. It creates workflow fragmentation that defeats governance and slows teams down.
Effective Infrastructure as Code management platforms abstract execution details behind unified workflows. They treat Terragrunt as a first-class orchestration primitive, not an afterthought. They provide native support for OpenTofu alongside Terraform, recognizing that organizations migrate gradually, not overnight. Most importantly, they enforce policy, detect drift, and maintain audit trails consistently across all workspaces, regardless of which engine runs the actual infrastructure changes.
The technical lesson: orchestration complexity belongs in platforms designed to manage it, not scattered across custom scripts and fragmented CI/CD pipelines. The operational lesson: governance doesn't slow teams down when it's embedded in the workflow rather than bolted on afterward. Multi-IaC environments are manageable when you have the right orchestration layer. Without it, you're just running tools in parallel and hoping they don't conflict.
Explore how Harness Infrastructure as Code Management handles multi-IaC orchestration, or review the technical documentation or implementation details. The product roadmap outlines upcoming capabilities for workflow optimization and policy enforcement.




Most development teams today build everything around Git, and deploy with GitOps principles.
Code sits in version controlled environments, changes go through PRs, and deployments are handled through modern CI/CD. That part is pretty standard at this point, especially when using a modern DevOps platform like Harness.
MongoDB fits into that developer world and workflow pretty naturally. Data is stored in documents that look a lot like JSON, the format many developers already use in application code and APIs. Under the hood, MongoDB stores those documents as BSON, which is essentially a binary form of JSON that supports additional data types like dates, object IDs, and binary data. That means developers get a familiar model to work with, while MongoDB gets a format that is efficient for storing and querying application data.

Looks just like JSON, with native types like ObjectId and dates powered by BSON.
The tradeoff is that structure isn’t always defined upfront. Schemas change over time, and not always in a clean or consistent way.
Collections can contain documents with different shapes. Index changes can directly impact performance. These aren’t problems on their own, but they require discipline to manage safely.
MongoDB changes are often handled outside the standard development workflow, whether that’s by developers, platform teams, or database teams.
Teams rely on application-level updates or one-off scripts to backfill data, modify structures, or create indexes. These approaches work, but they’re not always consistently versioned in Git. Execution can vary across environments, and review or validation is often informal.
The result is limited visibility into what changed, when it changed, and how it was applied. Over time, that leads to inconsistencies between environments and increased risk during deployment.
Flexibility is powerful, but without proper controls it introduces risk.
To solve this, teams need to bring MongoDB changes into the same workflow they already trust for application code: Git-driven, reviewable, and automated.
GitOps for MongoDB isn’t about changing how Mongo works. It’s about changing how changes are managed.
Instead of handling updates through scripts or application logic alone, database changes are treated like application code. Index creation, schema validation rules, and migration scripts are all defined in Git and tracked over time. This includes MongoDB’s native schema validation rules, which can be versioned and applied consistently across environments.
Changes need to go through pull requests, just like any other code change. This allows developers, platform teams, and DBAs to review what’s being modified before anything runs in an environment.
From there, pipelines handle the validation and deployment. Changes are applied consistently across environments, rather than being run manually and potentially differently each time.
In practice, this means a new field, an index, or a backfill isn’t just a script someone runs once. It’s a versioned change that can be reviewed, tested, and repeated.
This isn’t about forcing rigid schemas onto MongoDB. It’s about making changes visible, consistent, and easier to manage as systems grow.
Harness DB DevOps provides the structure to do this. With Harness, we define changes as changesets, store them in Git, and deploy them through pipelines with built-in validation and policy checks.
To demonstrate how this works, we will walk through a practical MongoDB change from start to finish.
Here’s a simple example: A team needs to add a new userPreferences field to the users collection and create an index to support a new query.
Instead of writing a script and running it manually, we define the change and commit it to Git.

1. Define the change in Git
A developer creates the update as a changeset. That includes the logic to add or backfill the new field, along with the createIndex operation needed for performance. The change is committed alongside application code, like any other update.
2. Open a pull request
From there, the change goes through a pull request. Other developers or DBAs can review what’s being changed before anything runs. If something looks off, it gets caught here instead of in production.
3. Let the pipeline take over
Once the change is approved, the pipeline takes over.
The Pipeline

Before anything gets applied, the change is validated and previewed against the target environment. This helps catch issues early, whether it’s a conflict, a bad query pattern, or something that could impact performance.
This is especially important for heavy operations like index creation on massive collections, where resource contention and performance degradation are real risks. Instead of running those changes manually, pipelines can enforce safe rollout strategies like rolling index creations across replica sets, without manual intervention.
Policies are enforced as part of that same process, with required approvals, environment rules, and other guardrails checked automatically so teams aren’t relying on someone to manually verify every step.
Once everything passes, the change is deployed through the pipeline and applied consistently across environments, moving from dev to staging to production in a controlled way. No one is logging into a database to run scripts by hand.
Now, everything is tracked. You can see what was applied, where it was deployed, when it happened, and who approved it, with a full history available if something needs to be reviewed or rolled back later.
Sound familiar? This workflow should sound a lot like application delivery, where changes are versioned, reviewed, validated before deployment, and visible after.
Traditionally, database changes have been tightly controlled by DBAs. They review scripts, approve changes, and sometimes execute them manually in each environment. That model helps reduce risk, but it doesn’t scale as teams grow and release more frequently.
With a GitOps approach, that control doesn’t disappear, it moves earlier in the process.
Instead of reviewing every individual change, database teams define policies and standards up front. Those rules are then enforced automatically through pipelines. Every change must pass the same checks before it reaches an environment, without requiring manual intervention each time.
In practice, this means:
The role of the database team evolves from gatekeeper to system designer. Rather than being involved in every deployment, they define the guardrails that ensure every deployment is safe.
Developers still move quickly, but now within a controlled, repeatable system.
Bringing MongoDB into a Git-driven workflow changes how teams ship.
MongoDB's flexibility doesn't eliminate the need for structure - it just shifts the responsibility for maintaining consistency from the database itself to your development processes.
If your application is managed through Git, your database should be too.


If you've ever run an ALTER TABLE on a busy MySQL table in production, you know the feeling. The change is small. The risk isn't. Long-running table locks, queued writes, application timeouts, replication lag, a five-minute migration that turns into a half-hour incident review.
We're shipping an integration that takes that anxiety out of the loop. Harness Database DevOps now supports Percona Toolkit for MySQL as part of Liquibase-based schema management. Flip a checkbox at schema creation, and eligible changes execute through pt-online-schema-change instead of native MySQL DDL.
Native ALTER TABLE on MySQL can lock tables for as long as the change takes to apply. On a large or hot table, that means writes pile up, dependent services start timing out, and replicas fall behind.
Percona Toolkit handles the same change very differently. pt-online-schema-change creates a shadow table with the new schema, copies your data over in small chunks, uses triggers to keep the original and shadow tables in sync, then performs an atomic swap with minimal lock time. The practical upside: schema changes you can run during business hours, not at 2 AM with a runbook open.
The integration is enabled per schema. When you create a Database Schema in Harness DB DevOps:
That's it. With the box unchecked (the default), Harness DB DevOps applies your changelogs using native MySQL operations through Liquibase, exactly as before. Check it, and eligible changes route through Percona Toolkit instead.
Percona Toolkit isn't a silver bullet for every DDL. A few cases need extra thought.
Adding or dropping foreign keys can break during the table swap, so plan those changes carefully or apply them outside the toolkit. Tables without a primary key or unique index won't migrate safely either, since pt-online-schema-change needs one to chunk data deterministically. And a handful of specific operations sit outside the safe-change envelope: dropping a primary key, complex column reordering, and some storage engine swaps.
You'll also want to give the database user the right privileges: ALTER, SELECT, INSERT, and UPDATE on the target table, plus CREATE and DROP on the database for shadow table management.
The full list of supported patterns, edge cases, and required permissions is in the Harness DB DevOps docs.
If you're already running Harness DB DevOps for MySQL, the next schema you create is a good place to try this. Turn it on against a non-critical environment first, watch how it behaves on your workload, and the path to using it in production gets a lot shorter.
For teams running MySQL at scale, that's one fewer reason to schedule schema changes around your customers' sleep.
If you aren't already using Database devops, speak with our experts to discuss how you can achieve zero downtime database schema migrations.
_%20Formula%2C%20Examples%20%26%20DevOps%20Use%20Cases.png)
_%20Formula%2C%20Examples%20%26%20DevOps%20Use%20Cases.png)
Your production problems aren't just random. If a Kubernetes node fails every 72 hours or your CI runners crash every 4 builds, that's a clear pattern. Mean Time to Failure (MTTF) turns these failures into data that you can control, plan for, and improve over time.
MTTF should not be a decoration on a dashboard for platform engineering leaders; it should be a decision-making tool. With the right calculations, you can set realistic SLOs, plan capacity, and cut down on developer work by focusing on the parts that break the most often. You'll get exact formulas for distributed systems, data collection patterns that avoid common mistakes, and a playbook to turn reliability improvements into measurable ROI through automated resilience practices alongside faster recovery metrics.
Stop letting unpredictable failures drain your team's time and budget. With Harness Continuous Integration and Continuous Delivery, you can turn MTTF insights into concrete pipeline changes, progressive delivery strategies, and guardrails that keep reliability improving release after release.
Mean Time to Failure (MTTF) is the average operating time of non-repairable components before failure across a population.
At a basic level:
MTTF = total operating time ÷ number of failures
If 100 CI runners each run for 50 hours during a week (5,000 runner‑hours total) and 20 runners experience at least one hard failure, then:
MTTF = 5,000 ÷ 20 = 250 hours
Historically, MTTF is used for physical assets you replace instead of fix (light bulbs, disks, sealed devices). In software, the same concept fits ephemeral resources such as:
MTTF tells you how long things run, on average, before they fail and must be replaced. MTTF is an approximation, not a strict reliability model.
Three reliability metrics show up in every platform review:
Use them to answer different questions:
For example:
Your platform scorecards should display all three together, alongside SLO health and error budget burn, so teams see the full reliability picture instead of optimizing a single metric in isolation.
The theoretical rules around MTTF and MTBF are straightforward; the ambiguity comes when you apply them to real cloud‑native stacks. Concrete examples help.
These components typically behave like non‑repairable items:
For each of these, you can treat a single lifecycle (from start to failure/termination) as one observation in your MTTF dataset.
These components behave more like classic repairable systems:
For these, you care more about how much uptime you get between failures (MTBF) and how quickly you can restore full health (MTTR).
It is tempting to say “our nodes have an MTTF of 720 hours, so our service is very reliable.” That is only true if your architecture masks those failures from users. User‑facing reliability lives at the service boundary, measured via SLOs and error budgets; component MTTF is an input that helps you:
MTTF helps you understand where things break; SLOs and MTTR tell you how much that matters to customers.
The MTTF calculation is trivial. The work is in collecting honest data across a distributed system without losing important details.
For each component type, decide exactly what counts as “failed,” for example:
Document these in your platform taxonomy so every team logs and reports failures the same way.
For each instance in the population you’re measuring, capture:
Then compute:
MTTF = total operating time across all instances ÷ number of failed instances
This gives you MTTF for that class (e.g., “Linux GPU runners in prod”).
Never pool dissimilar components into a single MTTF number. Instead:
Example:
Fleet MTTF (weighted) = (1,000 + 100) ÷ (5 + 1) ≈ 183 hours, not the naive (200 + 100) ÷ 2.
Some instances will still be running when you take the snapshot. If you drop them:
When censored samples are common, use basic survival analysis (like Kaplan–Meier) so that "still running" instances add to the exposure instead of being thrown away. If you give them clear timestamps and labels, observability tools and data teams can usually take care of this for you.
MTTF becomes strategically important when you use it to shape SLOs, error budgets, and reliability investments, not just track uptime.
If a class of components has an MTTF of 72 hours, a single instance will fail about:
8,760 hours/year ÷ 72 ≈ 121 failures/year
With multiple instances and redundancy, not every failure becomes a user‑visible incident, but you can still estimate:
MTTF highlights which components generate excessive manual work:
Use this to:
Because MTTF underpins incident rates, any improvement can be tied to measurable gains:
Treat MTTF as a leading indicator: when you raise it on critical components, you should see downstream improvements in SLO attainment and delivery cadence.
Once you know which components have the lowest MTTF and the highest operational cost, you can systematically improve them. In modern delivery pipelines, four patterns tend to pay off quickly.
Flaky CI is one of the most common sources of low MTTF and wasted engineering time.
You can improve CI‑related MTTF by:
Result: higher MTTF for pipelines and runners, fewer broken builds, and fewer interruptions for developers.
You cannot prevent every bad change, but you can limit how many become full‑blown incidents that count against your service‑level MTTF.
Key tactics:
This keeps effective MTTF for user‑facing services higher, even if underlying components still fail regularly.
Many MTTF regressions start as “just one more config change” that slips past informal reviews. Prevent those with:
This ensures the MTTF gains you’ve earned are not eroded by ad‑hoc changes and one‑off exceptions.
To sustainably raise MTTF, you need confidence that your architecture and runbooks can handle real failures, not just happy‑path tests.
By running targeted chaos experiments on the components with the lowest MTTF, you can:
When failures happen, MTTF tells you how often they occur. AI‑powered automation helps you decide what to do next—fast—so more failures stay under control and never become major incidents.
Harness AI‑assisted deployment verification analyzes metrics and logs during and after each deployment:
The result is fewer deployments turning into user‑visible failures and a higher effective MTTF for your services, because many problematic changes are automatically rolled back before customers notice.
On the CI side, AI‑driven analysis works with Test Intelligence and analytics to:
SLOs and error budgets turn raw data into rules. Instead of making teams watch dashboards and make decisions on their own, you can:
This completes the cycle: MTTF informs SLO design. Guardrails are based on SLOs, and AI-powered verification and rollbacks work on those guardrails at machine speed.
Want to turn MTTF insights into automated reliability improvements?
Explore Harness CI/CD to reduce failure rates, enforce guardrails, and improve SLO performance.
MTTF can feel abstract until you have to justify reliability decisions or explain incident patterns to stakeholders. These FAQs break down the most common questions practitioners ask about MTTF and how it relates to other reliability metrics.
MTTF is the average time it takes for a group of parts, like pods or temporary CI runners, to fail in a way that can't be fixed. MTBF tells you how long systems you fix and put back into service, like databases or long-running services, are up and running before they break down again.
When you need to know how often failures happen so you can plan for redundancy or auto-healing, use MTTF. Use MTTR to find out how quickly you can fix services that users can see after they go down. Both metrics work together and are usually used to help make decisions about SLOs and error budgets.
MTTF estimates are very uncertain when there aren't many failures. To make the number more reliable, put similar workloads together, add up the exposure hours for each class, and think of MTTF as a range or trend instead of a single point. If a part didn't fail in your window, don't assume that it will never fail; instead, treat that as incomplete data.
Most of the time, MTTF is skewed by dropping instances that are still running when the measurement is taken (right-censoring), combining environments (staging, load, and production) into one metric, and having different or unclear definitions of failure across teams. Fixing these problems usually makes MTTF more useful than any other advanced statistical method.
MTTF doesn't work when failures are very similar or when you're measuring systems that are fixed instead of replaced. In those cases, MTBF and MTTR, when looked at through SLOs and error budgets, usually give better advice than just one MTTF value.
When the MTTF is higher on important parts, there are fewer problems, fewer pages, and less time lost by developers fixing them. You can link improvements directly to faster safe release velocity, lower downtime risk, and lower operational costs when you combine MTTF with SLOs, error budgets, chaos engineering, and AI-powered automation.


Modern application security goes from code to runtime. Vulnerabilities are found at every stage of the software development lifecycle (SDLC) - in the code developers write, open source packages they pull in, container images they build, and cloud infrastructure where it all runs. But finding vulnerabilities is no longer enough. With attack surfaces sprawling across pipelines, registries, and production environments, the harder problem is fixing the vulnerabilities that actually matter.
Understanding what’s important increasingly depends on correlating multiple data points. A critical CVE buried in a dependency looks very different depending on whether the vulnerable function is actually reachable, the library is used in production, or the affected service is internet-facing. Without runtime context, security and development teams are often left triaging noise instead of actually reducing risk. And fixing vulnerabilities discovered in production can be challenging without being able to follow the trail back to the repo and line of code where the vulnerability can be found.
No matter where application security lives in your organization - and increasingly, it lives in more than one place - Harness and Wiz are working together to make sure you're covered. Whether your team is shifting left from cloud security or pushing right from the development pipeline, integrating Harness and Wiz brings code and runtime findings together so you always have the context you need to act.
Application security used to have a clear owner. The AppSec team ran the scanners, triaged findings, and created tickets for developers. But "shift left" has been pushing security earlier into the development process and ownership has been migrating toward the teams that actually write and ship code. Today, the DevSecOps or platform engineering team owns application security tooling in many organizations. They're the ones who know exactly where a vulnerability lives in code, who owns it, and how to get developers to fix them.
But as applications move to the cloud, cloud security and infrastructure teams have a stake in application security outcomes as well. They're the ones with visibility into what's actually running in production - what's internet-facing, what's over-privileged, what's actively being exploited. Cloud security platforms have expanded their focus from purely infrastructure and runtime back through the SDLC to code. For many cloud teams, application security isn't a handoff; it feeds into their cloud risk picture.
The result is that application security now has multiple stakeholders with different vantage points. DevSecOps teams see risk through the lens of the CI/CD pipeline and the developer workflow. Cloud security teams see it through the lens of the deployed environment and the blast radius of a breach. Neither view is complete on its own. The good news is that these teams don't have to choose between their tools or their workflows. They need integration that lets each team work in their context while sharing the signals that make both more effective.
DevSecOps teams need to expand right. SAST and SCA tools often generate more findings than any team can fix. Runtime context helps separate signal from noise. Knowing that a vulnerable service is actively internet-facing or that a dependency with a critical CVE is actually loaded in production changes how a team prioritizes. Without it, developers are left triaging based on CVSS scores alone. With it, they can focus effort where exposure is real and the risk in production is highest.

Harness Security Testing Orchestration (STO) makes it easy to orchestrate Wiz Code across your CI/CD pipelines. With a pre-built integration, you can deploy Wiz Code in just a few clicks instead of needing to create a custom integration or write custom scripts. Harness orchestrates Wiz Code alongside all your other scanners so you know your pipelines always get the required security tests, without needing to manually coordinate multiple tools.

Once Wiz Code is integrated, STO aggregates findings with other scanners in your pipeline, automatically deduplicating vulnerabilities so teams aren't triaging the same issue twice. The consolidated view means developers and security engineers can see the full picture in one place, understanding pipeline-level risk and assigning tickets to developers. In addition, Harness Policy as Code lets teams define and take action at the pipeline level instead of tool by tool, so decisions about what to fail a build on, what to flag for review, and what to pass through are applied consistently and holistically across every scan and pipeline.
Cloud security is pushing left - past runtime, past containers, all the way back to the code and open source packages that vulnerabilities originate from. The driver is enabling action. A misconfigured cloud resource or a vulnerable container image is more actionable when you can tie it back to the specific dependency introduced in a pull request, the developer who owns the code, and the pipeline that shipped it. Runtime findings without code context are just alerts. With code context, they become actionable work items that can be routed to the right person and fixed at the source.
Wiz Application Security Posture Management (ASPM) is designed to aggregate findings from across the SDLC and correlate them with runtime context - what's deployed, what's exposed, and what's actually at risk. By integrating Harness SAST and SCA scanner findings directly into Wiz, cloud security teams can connect the dots between a vulnerable open source package or insecure code pattern and the running workloads it affects. That correlation is what turns a list of CVEs into a prioritized risk picture that reflects what's actually happening in production.
For cloud security teams already working in Wiz, this integration means Harness SAST and SCA become part of their existing workflow rather than a separate tool to check. Code-level findings surface alongside runtime signals in the same platform where cloud risk is already being managed, analyzed, and acted on. Teams get broader coverage without adding friction, and the context that makes those findings meaningful - reachability, exposure, business criticality - is already there when they need it.
DevSecOps and cloud security teams are not generally not competing - they're looking at risk from different angles. One team lives in the development pipeline; the other lives in the cloud. Both need visibility into what the other sees to do their jobs well. When those views are siloed, findings get duplicated, priorities diverge, and the vulnerabilities that matter most fall through the cracks between teams.
Harness and Wiz close that gap from both directions. DevSecOps teams get runtime signals from Wiz Code inside the pipeline context where they already work, so they can prioritize fixes based on real-world exposure. Cloud security teams get code-level findings from Harness SAST and SCA inside the risk context where they already work, so they can trace production risk back to its source. Each team keeps their workflow. Both teams get the full picture.
The right combination of these integrations depends on how your organization is structured, where application security ownership sits today, and where you want it to go. If you're a Wiz customer evaluating how Harness SAST and SCA fit into your security program, or a Harness customer looking to bring runtime context into your pipelines, contact your Harness account team to understand how you can map the integrations to your specific environment.


Application security testing tools promise coverage and accuracy, but teams often struggle just to get started. One of the biggest friction points in dynamic application security testing is configuring authentication correctly so a scanner can even access a target application, let alone API endpoints that power the functionality.
Whether it’s API keys, bearer tokens, or custom auth flows, setting up authentication for scans frequently requires trial-and-error and engineering support. This reality of scanning configuration slows down security validations, delays insights, and makes it difficult to integrate with AI-driven tooling that depends on fast, accurate access to API endpoints.
Today, we’re excited to introduce AI-Powered Custom Authentication Generation—a new capability designed to eliminate this friction and help teams move from setup to security insights faster than ever.
With this release, teams can now generate and refine authentication configurations using natural language and LLMs. Instead of manually configuring authentication logic or relying on additional support, users can simply describe their requirements and let AI handle the rest.
The average time to configure authentication for API security testing is measured in seconds, whereas older manual approaches can take hours and require extensive trial-and-error.
Here are a few highlights:
Authentication setup has long been one of the most frustrating parts of security scanning. Access control mechanisms are already complex due to security hardening used to protect applications and APIs. Successfully automating authentication flows so a machine can access an app or endpoint raises the bar substantially.
Some of the common pain paints include:
What should be a simple prerequisite, gaining authenticated context into an application, becomes a major bottleneck to dynamic application security testing.
The new AI-powered authentication feature in Harness API Testing removes these barriers entirely by reworking how authentication config is created and managed.
Users can navigate to the authentication configuration page, select the custom option, and simply describe what they need. For example:
“Generate an API key-based authentication hook where the token <token> is injected into the request header <authorization>.”
With a single click on “Generate with AI,” the system produces a complete, ready-to-use authentication script. This functionality eliminates the need to write or stitch together configurations manually.

The feature supports a range of common authentication mechanisms, including:
This flexibility ensures teams can quickly configure access regardless of how their application or API is secured. Learn more details about the supported authentication types.
Authentication requirements often evolve. Instead of starting over, users can iteratively refine their configurations using natural language prompts.
For example, if you want to change how credentials are injected into the auth flow, you can simply say:
“Change the injection type to header name.”
By selecting “Refine with AI,” the system updates the existing configuration accordingly—no manual edits required.
Every AI-generated or modified configuration includes inline comments that explain what changed. These comments make it easier for teams to:

Additionally, no credentials are stored in logs or persisted in prompts. Any sensitive authentication material is masked and encrypted at rest.
By reducing setup errors and simplifying authentication configuration, this Harness API Testing feature directly improves scan success rates. Teams can spend less time troubleshooting authentication issues and more time analyzing real security findings.
This release is more than just a usability improvement. It’s a foundational step towards enabling AI-driven security workflows.
By removing the friction of authentication setup, teams can:
Ultimately, this translates to a faster time-to-value and a more scalable approach to dynamic application security testing.
AI-Powered Custom Authentication Generation is available immediately with your existing Harness subscription. You can find related technical documentation here.
Current Customers: Log in to your dashboard today to start exploring your threat data in a whole new dimension.
New to the Platform? If you aren't yet protected, contact us to schedule a personalized demo.


There is a version of the Legal team that exists in most companies: thorough, careful, and quietly overwhelmed. Good lawyers are spending their days on tasks that really should not require a lawyer at all.
We decided early on that this was not the team we wanted to be.
At Harness, the AI-first approach is not just saved for the engineering team. It is how every team operates, including Legal. That means we stopped asking “should we use AI?” a long time ago and started asking “how do we build with AI?” The result is a Legal team that does not just use AI tools. It develops them. We close faster, advise smarter, and frankly, have more fun doing it!
Every tool in our stack has a job. Here is what that looks like in practice:
The honest answer: the relationship between Legal and the rest of the business.
Turnarounds that used to take days take hours. Quality has gone up, not down. And because teams can self-serve answers to routine questions through our Legal Playbooks, the requests that do reach us are the ones that genuinely need us. We spend less time being a checkpoint and more time being a partner. That is a different job, and a more meaningful one.
Moving fast with AI does not mean being reckless about it:
What makes this more than a policy is the culture around it. We run regular sessions where the team shares what they are learning: tools worth trying, prompting approaches that actually work for legal drafting, and ways to get more out of what we already use. When one person figures something out, everyone benefits. That collective curiosity is what stops this from becoming shelfware and keeps it genuinely evolving.

If this is what Legal looks like at Harness, imagine the rest.
Every team here operates this way. Not because they are told to, but because it is genuinely a better way to work. If you are looking for a company where AI is woven into how things actually get done, not just what gets announced, we are hiring!


If your Terraform install is insecure or inconsistent, it can quickly slow down your delivery. A single compromised file or a misconfigured backend can stop deployments for many services. Teams that set up Terraform correctly from the start can scale easily and avoid compliance issues.
The answer is to install Terraform with strong security measures right from the beginning. Use verified binaries, encrypt your state, and set up automated CI/CD integration from day one. This method includes OS-specific setup, security checklists, GitOps alignment, and governance that can grow with your company. Want to speed up secure infrastructure automation? Harness Infrastructure as Code Management offers AI-powered pipelines with built-in governance for enterprises.
One misconfigured Terraform install can cause hours of pipeline failures across many services. When setting up Terraform on development machines, build agents, and production, focus on consistency and security for reliable automation. Start with verified binaries, pinned versions, and automated checks to keep your infrastructure stable.
Always get Terraform from HashiCorp’s official repositories, not from third-party mirrors or unofficial packages. For macOS, use the official Homebrew tap (brew tap hashicorp/tap && brew install hashicorp/tap/terraform).
On Linux, add HashiCorp’s GPG-signed package repository instead of using versions from your distribution, which may be outdated. Windows users should download signed binaries directly from releases.hashicorp.com. This helps keep your infrastructure safe from compromised or outdated packages.
To make builds reproducible, control the exact Terraform version in every environment. Download the specific version you need, such as from https://releases.hashicorp.com/terraform/1.6.0/terraform_1.6.0_linux_amd64.zip, and check the SHA256 checksum against HashiCorp’s signed SHASUMS file before extracting.
Keep your version-pinned install scripts in your infrastructure repository so teams can create identical environments. If you use Terraform with Harness, delegates manage versions for you, but local development still needs consistent versioning.
After installing Terraform, run terraform version to make sure the right version is active and in your PATH. Set up the plugin cache directory (TF_PLUGIN_CACHE_DIR) to avoid repeated provider downloads and check that you have write permissions.
Write a simple script to check the Terraform binary location, version, and basic provider setup. Run this script automatically in your CI/CD pipelines, container builds, and onboarding workflows to catch problems before they affect deployments. While local installation is useful for development, enterprise teams should standardize Terraform execution through an IaCM platform. This ensures consistent environments across developers, CI/CD pipelines, and production systems without relying on manual setup
Installing Terraform is only the beginning. In enterprise settings where you manage important infrastructure and need to meet regulations, hardening your Terraform setup turns a basic install into a system ready for production and governance. These controls are significantly easier to enforce when Terraform is managed through an IaCM platform that centralizes execution, credentials, and policy enforcement.
Credential Management and Execution Isolation:
Provider Security and Integrity:
State Management and Backend Security:
Make your Terraform CI/CD setup consistent by including the binary in versioned container images or reusable templates that all services use. This prevents differences between developer machines, build agents, and production. This approach can become even more scalable when implemented through an IaCM tool integrated with your CI/CD platform where Terraform execution, policy checks, and governance are built into reusable workspaces and modules.
When updating Terraform versions or security patches, make changes in your template library instead of updating each pipeline one by one. We recommend this version-controlled method for enterprise customers.
Use Policy as Code checks to enforce governance by validating Terraform versions, approved modules, and provider rules before running any plans. OPA can review Terraform plans in your CI/CD pipeline, automatically approving safe changes and flagging risky ones for manual review.
Pair this with GitOps workflows, where pull requests start plans and approved merges triggers applies. This creates clear audit trails for compliance and keeps developers moving quickly. Instead of treating Terraform as a standalone CLI step, IaC tools allow you to manage infrastructure workflows as first-class citizens within your delivery pipelines.
DevOps teams running hundreds of services need Terraform installation methods that scale and stay secure and compliant. Here are practical answers to common questions from teams in regulated settings.
Start with package repositories that include GPG verification rather than direct binary downloads to prevent compromised or malicious software packages. Install from official HashiCorp repositories with signed packages, verify SHA256 checksums, and run Terraform from isolated build environments with limited-access credentials that only provide necessary permissions. Keep your state files in encrypted, secure storage with access controls and comprehensive audit logging.
Include Terraform in your container images with specific versions, or use custom binaries to keep all pipeline runs consistent. Pin exact builds in your pipeline templates and use policy-as-code to allow only approved releases before running plans. This keeps development and production in sync and maintains clear compliance records.
Make reusable install scripts that check checksums and pin builds, then share them through central config management or container registries. Use remote execution on dedicated infrastructure for security and audit trails. Apply OPA policies to control which Terraform releases and providers your teams can use.
Running Terraform remotely on dedicated infrastructure gives you better security and audit trails. Running it locally on developer machines can cause compliance and credential issues. Use isolated build environments or cloud-managed services that run Terraform plans with proper authentication and detailed logs for production. Even better, IaC platforms standardize this by enforcing remote execution with built-in security, auditability, and role-based access controls.
Set up golden path templates with pinned Terraform installs that update all services automatically. Distribute approved releases using container images or package managers, or use platforms that handle governance for you. IaC platforms automate this by centrally managing Terraform versions and enforcing them across all pipelines and environments.
Standardizing how you install Terraform sets the stage for everything else. Pinning versions, using verified binaries, and securing remote state help your teams work quickly and stay compliant. These best practices are the base for templates that scale to hundreds of services.
Once you have this foundation, the real benefits come when your install standards connect to automated pipelines and GitOps workflows. Using centralized templates and modules for Terraform means security updates are spread automatically, and developers keep their flexibility. Policy-as-code makes sure every deployment meets enterprise needs without slowing things down. At this stage, adopting an IaC Platform approach becomes the recommended path. By managing Terraform through platforms like Harness, teams can standardize execution, enforce governance, and scale infrastructure delivery without increasing operational overhead.”
Are you ready to move from manual installs to enterprise-level automation and governance? Harness Infrastructure as Code Management offers AI-powered templates, a central control plane, and automated checks to make your Terraform setup a real advantage.



With the acceleration of AI-assisted coding, spurring the velocity of software releases, the challenge of ensuring stable deployments is heightened, and platform teams are feeling the hit. The State of AI-assisted Software Development DORA report measured a negative impact on software delivery stability: “an estimated 7.2% reduction for every 25% increase in AI adoption.”
The DORA report advises:
Considered together, our data suggest that improving the development process does not automatically improve software delivery—at least not without proper adherence to the basics of successful software delivery, like small batch sizes and robust testing mechanisms.
A robust testing mechanism rapidly gaining momentum is testing in production. Let’s take a closer look at how this practice boosts software delivery stability and supports the software development lifecycle (SDLC). We’ll also consider how to make testing in production, specifically A/B testing at scale, work for you.
Testing in production (TIP) means testing new software code on live traffic in active real-world environments. TIP is complementary to pre-production testing and does not replace it. It does, however, carry tangible benefits:
Feature flags are instrumental in the practice of safe testing in production because they decouple deployment and exposure at the most granular level. By means of feature flags, you implement incremental feature release techniques and unlock progressive experimentation. With carefully crafted A/B testing, you empower rapid feedback loops that confirm real feature value, validate high quality software, and increase team productivity and satisfaction.
These testing and verification capabilities are crucial as never before in this “AI moment” where AI-assisted coding enjoys wide adoption and funding.
A/B testing is the process of simultaneously testing two different versions of a web page or product feature in order to optimize a behavioral or performance metric, while ensuring guardrail metrics are not negatively impacted. A/B testing spans the whole spectrum of software verification: you can safely carry out architectural validation on fundamental architectural changes or gather behavioral analytics on UI variations.
Progressive experimentation with feature flags lets you roll out changes to a small slice of users first, catch problems early, and expand only when the data looks good.
The key is keeping deployment and release separate. You decouple deployment and release by delivering new features in a dormant state. Code goes out behind a flag. You validate it with real traffic.
A/B testing built into your CI/CD pipeline means you're making data-driven decisions based on observed metrics. Advanced feature flagging correlates statistical data, with pinpoint precision, to the actual feature variation causing the impact. Even when multiple features are rolled out concurrently, an enterprise-grade feature management platform will effectively parse the data, alert you to the impactful variant, and enable you to roll back any negative feature in seconds. The time/cost savings and safety benefits are astounding.
A/B testing provides a great experience for both marketing teams and engineers:
An enterprise-level platform like Harness, provides Feature Management and Experimentation, bringing flags, monitoring, and full experimentation freedom into a finely-tuned, seamless end-to-end software delivery tech stack for your platform team. Integrating A/B testing and feature flags directly into CI/CD pipelines empowers your teams with self-service experimentation while maintaining enterprise governance and security.
Bundling features into cliff-jump releases put every user account at risk simultaneously. A progressive ramp—starting with just 1 or 2% of traffic, and gradually increasing—means a bug in your checkout flow only affects a fraction of users before you catch it. Progressive delivery validates that SLOs are holding before exposure expands. p95 latency spiking? Error rate creeping up? You catch it when a tiny fraction of users are affected—not thousands—and Harness CD integrates cleanly with Jenkins, GitLab, or GitHub Actions.
The deploy-and-hold pattern is the keystone. Ship code in the "off" state behind a feature flag and nothing changes for users until you're ready. Deploy at 11 AM on a Tuesday instead of 1 AM on a Sunday. No change windows, no dashboard babysitting. Code is in production, the feature is dark, and you flip the switch when you're ready to monitor it. That's the freedom of progressive experimentation with feature flags in practice.
Raw telemetry is information in theory and chaos in practice. AI-powered monitoring watches flag-level metrics—not just "something is slower," but "checkout button variant B is adding 43ms of p95 latency." That specificity matters. When you have six active experiments running, your engineers are not flipping through dashboards trying to isolate which one broke something. The system tells you.
If your team is already running feature flags with health monitoring, you're closer to a full experimentation platform than you might think. Targeting logic, rollout percentages, kill switches—that's already experiment infrastructure. What's missing is experiment tracking, statistical analysis, and deterministic assignment.
To implement experiments with your feature flagging:
An experimentation system built on top of your feature flagging makes A/B testing a cinch and eliminates operational bottlenecks and technical debt for your platform team.
A/B testing doesn't have to be complicated. It can run as part of a structured rollout with automated KPI metrics and guardrails:

The seven stages are built into your pipeline and completed with minimal human intervention:
A common mistake is ramping too fast and drawing conclusions from thin data. If your sample size is too low, your experiment will be underpowered, and you will be unlikely to detect a reasonably-sized impact. Calculate that you have a large enough sample to be able to detect impacts of the size that are important to you.
Progressive experimentation requires patience. Premature conclusions produce unreliable results, and unreliable results produce bad decisions.
Every experiment should have a documented hypothesis, defined success metrics, blast radius assessment, and rollback plan before it touches production. Feature flag lifecycle management also keeps technical debt from quietly accumulating—flags that never get retired are toggle debt and a production surprise waiting to happen.
The goal isn't just fewer 3 a.m. incidents, though that's a welcome side effect. The real win is replacing gut feel with data at every stage of delivery.
With modern testing in production: feature flags decouple deploy from release, progressive ramps limit blast radius, AI-powered guardrails catch regressions before they spread, and centralized analytics replace the multi-tool sprawl that makes experimentation feel expensive.
Every time you release a feature you can ramp gradually up to 100% using percentage-based rollouts, alert on specific pre-decided latency increases, and enforce minimum sample sizes before promotion. Let every release become a decision backed by actual evidence, not optimism.
Harness Feature Management & Experimentation consolidates flags, release monitoring, and A/B testing, so every deployment is a controlled experiment—not a gamble.
How do you pick guardrail metrics without blocking every release?
Start with your existing SLO metrics and be conservative. Grafana's SLO guidance recommends event-based SLIs over percentiles for cleaner signals. Focus on business-critical user journeys first.
What's a practical ramp schedule for a mid-sized SaaS team?
Every team has slightly different criteria to consider before safely ramping up. Release monitoring with automated guardrails removes the need for someone to manually review metrics at each stage—which is the only way this actually scales.
How do you handle sample ratio mismatch?
Monitor assignment ratios continuously using chi-squared tests. Harness FME’s attribution and exclusion algorithm is honed to ensure accurate samples. In addition, FME reassesses experiment health in real-time, including sample ratio.
Filter bot traffic early too. Microsoft's bot detection research shows bots can skew conversion rates by 15–30%. Behavioral signals like sub-10-second session duration or unusual referrer patterns are a practical starting point for exclusion algorithms.
Should you A/B test infrastructure changes or just product features?
A/B testing works best for user-facing changes where behavior matters. Infrastructure changes are better suited to progressive rollouts with guardrail monitoring—different changes, different success metrics. Performance and reliability for engineering experiments; conversion and engagement for growth. Keep the tooling integrated in your pipeline either way.
How do you maintain consistent user experiences across devices and services?
Deterministic hashing on stable user IDs. Hash user ID plus experiment name to generate consistent assignments and make sure the same user sees the same variant whether they're on mobile, desktop, or clearing cookies every 20 minutes. Avoid session-based bucketing—it creates flickering experiences, causes re-bucketing, and erodes trust in experiment data. Lean on SDK-side evaluation for consistency that holds across your entire stack.


At 2 am, your migration goes live. By 2:07, error rates spike, and rollback isn’t an option. Cloud migrations, API rewrites, and architecture transformations rarely fail because of bad code. They fail because of how that code is released.
Most teams still rely on a “big bang” cutover where infrastructure, services, and user-facing changes go live at once. This concentrates risk into a single moment. When something breaks, rollback is slow, visibility is limited, and the blast radius is large.
This is not just anecdotal. According to BCG, more than half of transformation efforts fail to achieve their intended outcomes within three years.
The difference between success and failure is not the migration itself. It is the release strategy.
“Cloud migration” sounds simple, but in practice, it is a layered transformation.
Most migrations combine several of the following:
These rarely happen in isolation. Teams often try to ship them together in a single coordinated release. That coupling increases complexity and multiplies risk.
Before your next migration, list every system involved. If they are all released together, you are carrying unnecessary risk.
The failure mode is consistent:
There is no safe way to validate behavior in production. There is no gradual exposure. Rollback often requires redeploying an old stack that may no longer be compatible.
Even worse, teams lack a reliable baseline. They cannot answer simple questions:
Without that, migration becomes guesswork.
Modern teams are adopting a different model:
Feature flags provide a control layer that separates deployment from exposure. Code can exist in production without being active for all users.
This enables:
Start by putting one service behind a feature flag and releasing it to internal users first.
Instead of switching everything at once:
If something fails, you reduce traffic or revert instantly.
This shifts migration from a single high-risk event to a series of measurable steps.
A common migration strategy is the strangler fig pattern.
Feature flags make this executable in production by controlling routing and exposure. But to make this work in practice, you need a control layer that can manage traffic in real time.
Below is a simplified view of how feature flags act as a control plane during migration:

Fig: Feature-flag–driven progressive traffic routing during migration
Two things matter here:
This is not just a toggle. It is a runtime decision and an observability layer.
A successful migration is not defined by deployment success. It is defined by outcomes.
Key metrics include:
These metrics are not theoretical. They are what teams use to validate migrations in real production environments.
In the Beyond the Toggle ebook, a legacy Spark batch pipeline was replaced with a streaming architecture, with a progressive rollout rather than a cutover.
The new system showed faster processing and lower costs before the full rollout.
From the webinar, teams often go further:
This allows validation of both performance and data integrity before committing.
Define your baseline metrics before migration. If you cannot measure improvement, you cannot prove success.
Staging environments cannot replicate production conditions. They lack:
Feature flags enable safe production testing through controlled exposure.
Not all canary releases are percentage-based. Some teams roll out by country or user segment first, then expand globally.
To make this safe:
A migration is a sequence of decisions, not a single moment.
At each stage:
In one example from the webinar:
This approach removes pressure from a single “launch moment” and distributes risk across stages.
Modern flag systems avoid becoming a bottleneck:
This ensures minimal latency and high reliability.
Not all migrations are equal.
The key is incremental transition, not avoidance.
Feature flags are temporary by design.
If left unmanaged, they accumulate and create complexity. Teams need:
Emerging approaches include automation that detects stale flags and generates pull requests to remove them.
Adopting progressive delivery is not just a tooling decision. It changes how teams release software.
Key considerations:
Feature flags do not bypass controls. They enhance them by adding visibility and control at runtime.
For migration use cases, a Feature Flag platform should provide:
Flags should not feel like a bolt-on. They should be part of how software is built and released.
The biggest mistake teams make is treating migration as a moment.
It is not.
It is a controlled progression of changes, each validated in production under real conditions.
Feature flags enable this by:
The result is simple:
Migrations become reversible, observable, and data-driven.
Want a deeper breakdown of these patterns and real-world examples? Read the full ebook or see a demo.


Businesses today run on computers, cloud systems, and digital tools. One big failure can stop everything. A cyber attack, a power outage, or a software glitch can shut down operations for hours or days. Disaster recovery testing is how you prove you can restore critical services when the unexpected happens.
In 2026, with hybrid and multi-cloud estates, distributed data, and tighter oversight, this is not a once-a-year fire drill. It is a continuous discipline that validates plans, uncovers weak links before they cause outages, and gives leaders confidence that customer-facing and internal systems can bounce back on demand.
Disaster recovery testing is a simple way to practice getting your systems back online after something goes wrong. It checks if your backup plans actually work before a real problem hits. This blog gives you a clear, step-by-step look at what it is, why it is essential right now, and how to get started.
Disaster recovery testing is a structured way to confirm that systems, data, and services can be restored to meet defined recovery goals after a disruption. The mandate is simple: verify that recovery works as designed and within the time and data loss thresholds the business requires. Effective programs test more than technology. They exercise people, processes, communications, and third-party dependencies end to end. The goal is to prove you can bring back data, apps, and services quickly with little loss.
A strong disaster recovery test plan typically covers:
Without regular tests, even the best plan stays unproven. Many companies learn this the hard way when an outage lasts longer than expected.
Different systems require different levels of validation based on their criticality, risk, and business impact. A layered testing strategy helps teams build confidence gradually starting with low-risk discussions and moving toward full-scale failovers.
By combining multiple types of tests, organizations can validate both technical recovery and team readiness without unnecessary disruption.
Tabletop Exercises:
Tabletop exercises are discussion-based sessions where stakeholders walk through a hypothetical disaster scenario step by step. These are typically the starting point for any disaster recovery program, as they help clarify roles, responsibilities, and decision-making processes. While they do not involve actual system changes, they are highly effective in identifying communication gaps and aligning teams on escalation paths.
Simulations:
Simulations introduce more realism by creating scenario-driven drills with staged alerts and mocked dependencies. Teams respond as if a real incident is happening, but without impacting production systems. This type of testing is useful for validating how teams react under pressure and ensuring that tools, alerts, and workflows function as expected in a controlled environment.
Operational Walkthroughs:
Operational walkthroughs involve executing recovery runbooks step by step to verify that all prerequisites such as permissions, tooling, and sequencing are in place. These tests are more hands-on than simulations and are often conducted before attempting partial or full failovers. They help reduce surprises by ensuring that recovery procedures are practical and executable.
Partial Failovers:
Partial failovers test the recovery of specific services, components, or regions, usually during off-peak hours. This approach allows teams to validate critical dependencies and recovery workflows without risking the entire system. It is especially useful for building confidence in complex environments where a full failover may be too risky or costly to perform frequently.
Full Failovers:
Full failovers are the most comprehensive form of disaster recovery testing, where production systems are completely switched to a secondary site or region. After validation, systems are failed back to the primary environment. These tests provide the strongest proof of resilience, as they validate end-to-end recovery, including performance and data integrity, but they require careful planning due to their potential impact.
Automated Validations:
Automated validations use codified workflows or pipelines to continuously test recovery processes. These tests can automatically spin up recovery environments, validate configurations, and run health checks. They are ideal for frequent, low-risk testing and help reduce human error while providing fast and consistent feedback. Over time, automation becomes a key driver for maintaining continuous assurance in disaster recovery readiness.
Here’s the table outlines the primary types of disaster recovery testing and where they fit.

If you are building a disaster recovery testing checklist, include a mix of these types of disaster recovery testing and map each to the systems they protect. Over time, increase the frequency of automated validations and reserve full failovers for the highest-value services.
The world is more connected than ever. Companies rely on cloud services, remote teams, and AI tools. At the same time, threats keep growing. Cyber attacks like ransomware are more common. Natural events and supply chain problems add extra risk. Cloud systems can fail without warning.
Recent studies show the cost of downtime keeps rising. For many large companies, one hour of downtime can cost more than 300,000 dollars. Some industries see losses climb into the millions per hour. Smaller businesses lose thousands per minute in lost sales and unhappy customers.
In 2026, experts note that most organizations still test their recovery plans only once or twice a year. That is not enough. Systems change fast. New software updates, new cloud setups, and new team members can break old plans.
Regular testing gives you confidence. It cuts recovery time and protects revenue. It also helps meet rules from banks, healthcare groups, and government agencies that require proof of preparedness.
Traditional testing took weeks of manual work. Today, platforms combine different testing methods in one place. This approach saves time and gives better results.
For example, Harness recently released its Resilience Testing module. It brings together chaos testing (to inject real-world failures safely), load testing (to check performance under stress), and disaster recovery testing. You can run everything inside your existing pipelines. This means you can test recovery steps automatically, validate failovers, and spot risks early.
Teams using this kind of integrated platform report faster recovery times and fewer surprises. It fits right into daily development work instead of feeling like an extra project.
Artificial intelligence is making disaster recovery testing much smarter in 2026. It turns testing from a once-a-year chore into something fast, ongoing, and more accurate.
AI helps teams spot problems early by analyzing system data and predicting where failures might happen, allowing issues to be fixed before they cause real damage. It also enables continuous and automated testing, running scenarios in the background without interrupting normal business operations. Instead of manually creating test plans, AI can generate and recommend the most relevant scenarios based on your actual system setup, saving time and improving coverage.
Another major advantage is how quickly AI can analyze results. It processes test outcomes in real time and clearly points out what needs to be fixed, removing the guesswork. Over time, it learns from every test run and continuously improves your disaster recovery strategy, making it more reliable with each iteration.
Overall, AI helps teams recover faster and with fewer mistakes. Rather than relying on assumptions, teams get clear, data-driven insights to strengthen their systems. Tools like the Resilience Testing module from Harness already bring these capabilities into practice by combining chaos testing, load testing, and disaster recovery testing. With AI built into the platform, it can recommend the right tests, automate execution, and provide simple, actionable steps to improve system resilience.
Disaster recovery testing is not a one-time task. It is an ongoing habit that protects your business in 2026 and beyond. The companies that test regularly recover faster, lose less money, and keep customer trust.
Take a moment now to review your current plan. Pick one critical system and schedule a simple test this quarter. If you want a modern way to make the process simple and powerful, look at solutions like the Resilience Testing module from Harness. It helps you combine multiple testing types and use AI so you stay ready no matter what comes next.
Your business depends on technology. Make sure that technology can bounce back when it counts. Start testing today and build the confidence your team needs for whatever 2026 brings.


The design of the Harness MCP (Model Context Protocol) server is driven by a pattern that keeps reappearing across systems that scale well: small, stable interfaces with most of the complexity pushed behind a dispatch layer. The central idea is this: the agent loop behaves like an operating system boundary. The LLM is the reasoning engine, the context window is working memory, tool calls act like syscalls, and the MCP server serves as a kernel that mediates access to underlying systems. This isn’t a literal equivalence, but it’s a useful design lens. It forces you to think in terms of memory pressure, interface stability, and clean I/O contracts.
We built the Harness MCP server to make Harness agent-native. In practice, that means exposing the platform through a runtime-discoverable, schema-driven interface that agents can inspect, select from, and compose without hardcoded knowledge of the domain. Today, that interface consists of 10 generic tools that dispatch to 30 toolsets covering 140+ resource types across the platform, along with 57 Knowledge Graph views for cross-module analytics.
Those numbers matter less than the constraint behind them: tool count stays constant while capability scales through data and dispatch. The goal is to keep the agent’s context focused on reasoning, not on parsing a large menu of endpoints.
Before getting into the architecture, though, it’s worth asking a simpler question: why does Claude feel so capable when you give it nothing more than a bash shell?
Give Claude access to a terminal. Just bash. No APIs, no SDKs, no custom tools. It can navigate an unfamiliar codebase, find a bug across 50 files, refactor code, run tests, and commit end-to-end.
Now give an LLM access to a hundred perfectly-documented REST endpoints. It gets confused by the tool count, picks the wrong endpoint, and loses track of multi-step operations.
The difference isn't the tools themselves. It's the shape of the interface. The point isn’t that shell text streams are superior to structured APIs, but that agents perform better with interfaces that have a small, consistent grammar and are easy to compose.
Bash provides three properties that matter enormously for agent reasoning:
Composability. Every Unix tool does one thing and communicates through a uniform interface: text streams. grep | sort | uniq -c | head is four tools composed into an analytical pipeline. The agent doesn't need to know about a special "count-unique-matches" API. It composes primitives.
Uniform interface. Every tool takes text in and produces text out. There's no per-tool protocol, no per-tool authentication, no per-tool response schema. The contract is always the same: stdin, stdout, and exit code.
Introspection. ls, find, file, cat, head — the agent can discover what exists at runtime. It doesn't need to memorize the file system layout. It explores, then acts.
These three properties mean the agent doesn't need to hold 200 tool schemas in its context window. It learns a small set of verbs and composes them. The intelligence isn't in any single tool. It's in the loop that decides what to call next.
Watch what actually happens when Claude debugs with bash:
1. Observe: ls src/ → see the project structure
2. Hypothesize: "error likely in auth module"
3. Act: grep -r "token" src/auth/
4. Observe: see the grep output
5. Refine: "ah, token expiry not handled"
6. Act: cat src/auth/session.ts
7. Observe: read the file
8. Fix: edit the file
9. Verify: npm testThis is not "call the right API." This is a reasoning loop — observe, hypothesize, act, verify. The bash commands are just I/O. The reasoning happens between them.
This loop is the program. The tools are the I/O. And the design of the tools determines how efficiently the loop can run.
Every agent, whether it's Claude in a terminal, Cursor with MCP tools, or a custom orchestrator, runs some version of this loop:
while (!task_done) {
context = observe(environment) // tool outputs, previous results
plan = reason(context, goal) // LLM inference
action = select_tool(plan) // tool selection
result = execute(action) // tool call
environment.update(result) // state change
}This is an event loop. The LLM is the scheduler (the scheduling behavior is an emergent property of the loop, not an intrinsic property of the LLM). The tools are I/O operations. The context window is working memory. Each iteration, the agent observes the current state, reasons about what to do next, selects a tool, executes it, and incorporates the result into its context.
The critical insight: the intelligence is in the loop, not in the tools. The tools just move information in and out. The loop is what plans, backtracks, retries, composes, and converges.
This means the quality of the agent's output depends on two things:
If the tools are well-designed (few, composable, self-describing, context-efficient), the loop can reason clearly. If the tools are poorly designed (many, verbose, opaque), the loop spends its context budget parsing menus and payloads instead of thinking.
Our MCP server does not implement the agent loop. The loop lives in the MCP host: Cursor, Claude Desktop, or whatever IDE/agent framework the user is running. Our server is stateless at the request level: each tool call arrives as a JSON-RPC message, runs an async handler, and returns a structured response. Task-level state lives in the MCP host and in the underlying Harness systems.
We implement the kernel that the loop dispatches into. Our job is to make each dispatch fast, clean, and context-efficient.
Before drawing the OS analogy, it's worth stepping back. The properties that make bash work for agents, composability, uniform interface, and runtime discovery, aren't unique to Unix. They show up in every long-lived system that engineers describe as "just working."
Linux: The syscall ABI has been stable for decades. The VFS (Virtual File System) is a dispatch table: open(), read(), write(), close() work against ext4, NFS, procfs, sysfs, and any backend. New filesystem? Write a driver, load it at runtime. The interface never changes. /proc and /sys let the kernel describe itself through runtime introspection.
Git: Content-addressable blobs plus a handful of verbs. Branches are just pointers. The plumbing/porcelain split gives you a tiny, stable core with everything else built through composition. The transport protocol is uniform: push/fetch work the same over HTTP, SSH, or a local filesystem.
Kubernetes: Declare desired state. Controllers reconcile. kubectl get, apply, describe work on any resource kind: Pods, Services, your custom CRDs. New capability = new CRD, not a new CLI.
SQL: Small grammar: SELECT, JOIN, WHERE, GROUP BY. Works against any schema. The engine optimizes. You declare intent. The grammar has been stable for 40 years.
These systems share five properties:
This is the design target for agent infrastructure.
REST APIs answered two questions well:
For programs, code written by humans who already understood the domain, this was enough.
The developer read the docs, wrote the integration, and deployed it. The logic was pre-written.
An agent encounters your API at runtime, with no prior knowledge. It needs a third answer:
WHY:
This "why" lived in documentation, READMEs, and developers' heads. It was never machine-readable. MCP fills this gap by making tools carry their own intent — descriptions, hints (readOnlyHint, destructiveHint), schemas, and metadata that the agent reads at runtime to decide what to call.
The difference between REST and MCP isn't the transport. It's the audience. REST APIs are typically optimized for pre-written integrations. MCP tool surfaces are optimized for runtime selection and composition by an agent.
The mapping between operating systems and agent platforms is more than metaphorical — parts of it are structural, and the rest provide a useful design vocabulary. The same engineering constraints apply, and the same design principles solve them.
This is the most important mapping, and it has direct engineering consequences.
The context window is finite. Every token you put in is a token that can't be used for something else. Verbose API responses, unnecessary fields, large tool schemas: these are all memory allocations. If you fill the context with data, the agent can't reason.
The OS parallels are precise:
The #1 job of an agent platform is to keep the context window clean for reasoning. Every architectural decision should be evaluated through this lens: does this consume more or less of the context budget?
Programs don't write directly to disk. They call write(), and the kernel handles buffering, permissions, journaling, and device-specific quirks. This abstraction is what makes programs portable and reliable.
The same applies to agents. An agent shouldn't construct HTTP requests with auth headers, manage pagination cursors, handle retry backoff, or parse nested response wrappers. It should call a tool — a syscall — and the MCP server (the kernel) handles all of that.

The tool is the syscall. The MCP server is the kernel. Same contract every time. The agent never has to think about x-api-key headers, accountIdentifier query parameters, or exponential backoff on HTTP 429.
An OS doesn't load every file into RAM upfront. It uses virtual memory — a large address space backed by on-demand paging. Hot pages stay in RAM; cold pages live on disk until needed.
Our MCP server applies the same pattern to domain knowledge. The agent's "address space" covers 140+ resource types. But at any given moment, only the relevant metadata occupies context:
This is demand paging for domain knowledge. The agent discovers what it needs, when it needs it, and the rest stays "on disk" (available but not occupying context).
The MCP server has three layers, each corresponding to a layer in the OS model:


Layer 1 — MCP Tool Surface (syscall table). Ten generic tools that accept a resource_type parameter and dispatch through the registry. These are registered with the MCP SDK using Zod schemas for input validation. Each tool handler is a thin wrapper: normalize inputs → call registry → format response.
Layer 2 — Registry (kernel). The Registry class in src/registry/index.ts is the core dispatch engine. It holds a Map<string, ResourceDefinition> populated from 30 toolset files. When a tool handler calls registry.dispatch(client, resourceType, operation, input), the registry resolves the ResourceDefinition, looks up the EndpointSpec, and calls executeSpec() — the single execution pipeline that handles path templating, scope injection, query parameter building, body construction, auth header interpolation, HTTP dispatch, response extraction, and deep link generation.
Layer 3 — HarnessClient (block device driver). The raw HTTP client in src/client/harness-client.ts. Handles fetch() with the x-api-key auth header, accountIdentifier injection, retry with exponential backoff on 429/5xx, client-side rate limiting, timeouts, and response parsing.
The agent learns 10 verbs. They work against every domain in Harness.
Every tool the agent "sees" costs tokens:
Our approach keeps this at ~1.2%. Tool count stays O(1). Capabilities grow O(n). This is the core design invariant.
The registry is a vtable — a dispatch table that maps (resource_type, operation) to an EndpointSpec and executes it through a unified pipeline. One execution path. Every resource type. Every operation.
Principle: Don't create a tool per API endpoint. Create generic verbs that dispatch by resource type through a registry.
This is the same insight behind REST (uniform interface + varying resources) and Unix (uniform file interface + varying devices). The agent learns the grammar once — list, get, create, execute. New nouns (resource types) are just data in the registry.
Principle: Resource definitions are data structures, not handler functions.
Each API mapping is expressed as an EndpointSpec — a declarative object that describes the HTTP method, path, path parameters, query parameter mappings, body builder, response extractor, and metadata. The registry's executeSpec() reads this spec and handles execution.
This means:
Principle: Centralized dispatch creates compounding returns on infrastructure investment.
Features that propagate everywhere through the registry:
Move error detection left. Validate agent-generated inputs before spending API budget on execution.
When an agent tries to create or execute something, validate the inputs before committing. If the agent provides a malformed pipeline YAML or references a nonexistent service, catch it at the schema level — before the API call burns tokens on a 400 error and the agent has to parse the response to figure out what went wrong.
harness_execute(
resource_type='pipeline', action='run',
inputs={branch: 'main', service: 'payment-svc'}
)
← Error: input 'service' is not a valid runtime input for this pipeline.
Valid inputs: branch, environment, tag. Did you mean 'environment'?
// Agent retries with corrected inputs. Typically converges in one retry.Validation is cheap — milliseconds. Wrong answers are expensive — broken trust, bad decisions. This is compile-time checking for agent-generated operations.
Principle: Let agents discover your domain model at runtime. Self-describing systems don't need documentation updates.
Add a new toolset → the agent discovers it immediately. Add a new resource type → the agent can query it immediately. This is introspection — ls for your platform. The same thing that makes bash work for agents.
Problem: Creating list_pipelines, get_pipeline, list_services, etc. Each tool costs ~150 tokens. At 50 tools, that's 7,500 tokens of menu.
Fix: Generic verbs with type dispatch.
Problem: Returning the full Harness API response — 50+ fields, nested wrappers.
Fix: Use responseExtractor to return clean, relevant fields. Treat context tokens like memory allocations.
Problem: Embedding field lists or API shapes in tool descriptions. They go stale immediately.
Fix: Keep tool descriptions generic. Point to harness_describe() for runtime discovery.
Problem: Agents often fetch large datasets and aggregate in context, leading to extremely high token usage and degraded accuracy.
Fix: Routing aggregation to the Knowledge Graph dramatically reduces token usage and improves answer reliability.
Problem: Adding per-resource docs to instructions in src/index.ts.
Fix: Keep instructions under ~20 lines. Put resource-specific guidance in description, diagnosticHint, executeHint, and bodySchema.description on the EndpointSpec.
The agent loop is the new operating system. That’s not a rhetorical flourish. It’s a constraint with real engineering consequences.
Every design decision in the Harness MCP server follows from a single principle: the context window is RAM, and RAM is finite. Verbose responses trash it. Oversized tool menus fragment it. Redundant schemas waste it. The agent’s ability to reason, to observe, hypothesize, act, and verify, degrades in direct proportion to how much of that budget gets consumed by infrastructure noise instead of domain signal.
The patterns described here, generic verbs with type dispatch, declarative resource definitions, demand-paged schema discovery, and centralized kernel dispatch, aren’t novel. They’re the same patterns that made Unix, Git, Kubernetes, and SQL endure for decades: small, stable interfaces, uniform contracts, runtime introspection, and the ability to extend without changing the core interaction model.
What’s different is the audience. Those systems were designed for programs. This one is designed for reasoning systems operating at runtime.
If you're building agent infrastructure, the questions to ask are the same ones OS designers asked in the 1970s: Does this abstraction compose? Does it describe itself? Does it keep the critical resource, then RAM, now context, available for the work that actually matters?
A useful test for any tool, schema, or abstraction is simple: does it reduce the amount of information the agent has to hold in working memory, or increase it? If it increases it, it’s probably making the system worse.
—
If you found this useful, follow and subscribe to the Harness Engineering blog for more deep dives on building agent-native systems and modern developer infrastructure.
Need more info? Contact Sales