
Engineering teams are generating more shippable code than ever before — and today, Harness is shipping five new capabilities designed to help teams release confidently. AI coding assistants lowered the barrier to writing software, and the volume of changes moving through delivery pipelines has grown accordingly. But the release process itself hasn't kept pace.
The evidence shows up in the data. In our 2026 State of DevOps Modernization Report, we surveyed 700 engineering teams about what AI-assisted development is actually doing to their delivery. The finding stands out: while 35% of the most active AI coding users are already releasing daily or more, those same teams have the highest rate of deployments needing remediation (22%) and the longest MTTR at 7.6 hours.
This is the velocity paradox: the faster teams can write code, the more pressure accumulates at the release, where the process hasn't changed nearly as much as the tooling that feeds it.
The AI Delivery Gap
What changed is well understood. For years, the bottleneck in software delivery was writing code. Developers couldn't produce changes fast enough to stress the release process. AI coding assistants changed that. Teams are now generating more change across more services, more frequently than before — but the tools for releasing that change are largely the same.
In the past, DevSecOps vendors built entire separate products to coordinate multi-team, multi-service releases. That made sense when CD pipelines were simpler. It doesn't make sense now. At AI speed, a separate tool means another context switch, another approval flow, and another human-in-the-loop at exactly the moment you need the system to move on its own.
The tools that help developers write code faster have created a delivery gap that only widens as adoption grows.
Today Harness is releasing five capabilities, all natively integrated into Continuous Delivery. Together, they cover the full arc of a modern release: coordinating changes across teams and services, verifying health in real time, managing schema changes alongside code, and progressively controlling feature exposure.
Release Orchestration replaces Slack threads, spreadsheets, and war-room calls that still coordinate most multi-team releases. Services and the teams supporting them move through shared orchestration logic with the same controls, gates, and sequence, so a release behaves like a system rather than a series of handoffs. And everything is seamlessly integrated with Harness Continuous Delivery, rather than in a separate tool.
AI-Powered Verification and Rollback connects to your existing observability stack, automatically identifies which signals matter for each release, and determines in real time whether a rollout should proceed, pause, or roll back. Most teams have rollback capability in theory. In practice it's an emergency procedure, not a routine one. Ancestry.com made it routine and saw a 50% reduction in overall production outages, with deployment-related incidents dropping significantly.
Database DevOps, now with Snowflake support, brings schema changes into the same pipeline as application code, so the two move together through the same controls with the same auditability. If a rollback is needed, the application and database schema can rollback together seamlessly. This matters especially for teams building AI applications on warehouse data, where schema changes are increasingly frequent and consequential.
Improved pipeline and policy support for feature flags and experimentation enables teams to deploy safely, and release progressively to the right users even though the number of releases is increasing due to AI-generated code. They can quickly measure impact on technical and business metrics, and stop or roll back when results are off track. All of this within a familiar Harness user interface they are already using for CI/CD.
Warehouse-Native Feature Management and Experimentation lets teams test features and measure business impact directly with data warehouses like Snowflake and Redshift, without ETL pipelines or shadow infrastructure. This way they can keep PII and behavioral data inside governed environments for compliance and security.
These aren't five separate features. They're one answer to one question: can we safely keep going at AI speed?
Traditional CD pipelines treat deployment as the finish line. The model Harness is building around treats it as one step in a longer sequence: application and database changes move through orchestrated pipelines together, verification checks real-time signals before a rollout continues, features are exposed progressively, and experiments measure actual business outcomes against governed data.
A release isn't complete when the pipeline finishes. It's complete when the system has confirmed the change is healthy, the exposure is intentional, and the outcome is understood.
That shift from deployment to verified outcome is what Harness customers say they need most. "AI has made it much easier to generate change, but that doesn't mean organizations are automatically better at releasing it," said Marc Pearce, Head of DevOps at Intelliflo. "Capabilities like these are exactly what teams need right now. The more you can standardize and automate that release motion, the more confidently you can scale."
The real shift here is operational. The work of coordinating a release today depends heavily on human judgment, informal communication, and organizational heroics. That worked when the volume of change was lower. As AI development accelerates, it's becoming the bottleneck.
The release process needs to become more standardized, more repeatable, and less dependent on any individual's ability to hold it together at the moment of deployment. Automation doesn't just make releases faster. It makes them more consistent, and consistency is what makes scaling safe.
For Ancestry.com, implementing Harness helped them achieve 99.9% uptime by cutting outages in half while accelerating deployment velocity threefold.
At Speedway Motors, progressive delivery and 20-second rollbacks enabled a move from biweekly releases to multiple deployments per day, with enough confidence to run five to 10 feature experiments per sprint.
AI made writing code cheap. Releasing that code safely, at scale, is still the hard part.
Harness Release Orchestration, AI-Powered Verification and Rollback, Database DevOps, Warehouse-Native Feature Management and Experimentation, and Improve Pipeline and Policy support for FME are available now. Learn more and book a demo.

On March 19th, the risks of running open execution pipelines — where what code runs in your CI/CD environment is largely uncontrolled — went from theoretical to catastrophic.
A threat actor known as TeamPCP compromised the GitHub Actions supply chain at a scale we haven't seen before (tracked as CVE-2026-33634, CVSS 9.4). They compromised Trivy, the most widely used vulnerability scanner in the cloud-native ecosystem, and turned it into a credential-harvesting tool that ran inside victims' own pipelines.
Between March 19 and March 24, 2026, organizations running affected tag-based GitHub Actions references were sending their AWS tokens, SSH keys, and Kubernetes secrets directly to the attacker. SANS Institute estimates over 10,000 CI/CD workflows were directly affected. According to multiple security research firms, the downstream exposure extends to tens of thousands of repositories and hundreds of thousands of accounts.
Five ecosystems. Five days. One stolen Personal Access Token.
This is a fundamental failure of the open execution pipeline model — where what runs in your pipeline is determined by external references to public repositories, mutable version tags, and third-party code that executes with full privileges. GitHub Actions is the most prominent implementation.
The alternative, governed execution pipelines, where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references, is the model we designed Harness around years ago, precisely because we saw this class of attack coming.
TeamPCP wasn't an anomaly; it was the inevitable conclusion of an eighteen-month escalation in CI/CD attack tactics.
CVE-2025-30066. Attackers compromised a PAT from an upstream dependency (reviewdog/action-setup) and force-pushed malicious code to every single version tag of tj-actions/changed-files. 23,000 repositories were exposed. The attack was later connected to a targeted campaign against Coinbase. CISA issued a formal advisory.
This proved that the industry's reliance on mutable tags (like @v2) was a serious structural vulnerability. According to Wiz, only 3.9% of repositories pin to immutable SHAs. The other 96% are trusting whoever owns the tag today.
The first self-replicating worm in the CI/CD ecosystem. Shai-Hulud 2.0 backdoored 796 npm packages representing over 20 million weekly downloads — including packages from Zapier, PostHog, and Postman.
It used TruffleHog to harvest 800+ credential types, registered compromised machines as self-hosted GitHub runners named SHA1HULUD for persistent C2 over github.com, and built a distributed token-sharing network where compromised machines could replace each other's expired credentials.
PostHog's candid post-mortem revealed that attackers stole their GitHub bot's PAT via a pull_request_target workflow exploit, then used it to steal npm publishing tokens from CI runner secrets. Their admission that this kind of attack "simply wasn't something we'd prepared for" reflects the industry-wide gap between application security and CI/CD security maturity. CISA issued another formal advisory.
TeamPCP went after the security tools themselves.
They exploited a misconfigured GitHub Actions workflow to steal a PAT from Aqua Security's aqua-bot service account. Aqua detected the breach and initiated credential rotation — but reporting suggests the rotation did not fully cut off attacker access. TeamPCP appears to have retained or regained access to Trivy's release infrastructure, enabling the March 19 attack weeks after initial detection.
On March 19, they force-pushed a malicious "Cloud Stealer" to 76 of 77 version tags in trivy-action and all 7 tags in setup-trivy. Simultaneously, they published an infected Trivy binary (v0.69.4) to GitHub Releases and Docker Hub. Every pipeline referencing those tags by name started executing the attacker's code on its next run. No visible change to the release page. No notification. No diff to review.
TeamPCP's payload was purpose-built for CI/CD runner environments:
Memory Scraping. It read /proc/*/mem to extract decrypted secrets held in RAM. GitHub's log-masking can't hide what's in process memory.
Cloud Metadata Harvesting. It queried the AWS Instance Metadata Service (IMDS) at 169.254.169.254, pivoting from "build job" to full IAM role access in the cloud.
Filesystem Sweep. It searched over 50 specific paths — .env files, .aws/credentials, .kube/config, SSH keys, GPG keys, Docker configs, database connection strings, and cryptocurrency wallet keys.
Encrypted Exfiltration. All data was bundled into tpcp.tar.gz, encrypted with AES-256 and RSA-4096, and sent to typosquatted domains like scan.aquasecurtiy[.]org (note the "tiy"). These domains returned clean verdicts from threat intelligence feeds during the attack. As a fallback, the stealer created public GitHub repos named tpcp-docs under the victim's own account.
The malicious payload executed before the legitimate Trivy scan. Pipelines appeared to work normally. CrowdStrike noted: "To an operator reviewing workflow logs, the step appears to have completed successfully."
Sysdig observed that the vendor-specific typosquat domains were a deliberate deception — an analyst reviewing CI/CD logs would see traffic to what appears to be the vendor's own domain.
It took Aqua five days to fully evict the attacker, during which TeamPCP pushed additional malicious Docker images (v0.69.5 and v0.69.6).
Why did this work so well? Because GitHub Actions is the leading example of an open execution pipeline — where what code runs in your pipeline is determined by external references that anyone can modify.
This trust problem isn't new. Jenkins had a similar issue with plugins. Third-party code ran with full process privileges. But Jenkins ran inside your firewall; exfiltrating data required getting past your network perimeter.
GitHub Actions took the same open execution approach but moved execution to cloud-hosted runners with broad internet egress, making exfiltration trivially easy. TeamPCP's Cloud Stealer just needed to make an HTTPS POST to an external domain, which runners are designed to do freely.
Here are a few reasons why open execution pipelines break at scale:
Mutable Trust. When you use @v2, you are trusting a pointer, not a piece of code. Tags can be silently redirected by anyone with write access. TeamPCP rewrote 76 tags in a single operation. 96% of the ecosystem is exposed.
Flat Privileges. Third-party Actions run with the same permissions as your code. No sandbox. No permission isolation. This is why TeamPCP targeted security scanners — tools that by design have elevated access to your pipeline infrastructure. The attacker doesn't need to break in. The workflow invites them in.
Secret Sprawl. Secrets are typically injected into the runner's environment or process memory during job execution, where they remain accessible for the job's duration. TeamPCP's /proc/*/mem scraper didn't need any special privilege. It just needed to be running on the same machine.
Unbounded Credential Cascades. There is no architectural boundary that stops a credential stolen in one context from unlocking another. TeamPCP proved this definitively: Trivy → Checkmarx → LiteLLM → AI API keys across thousands of enterprises. One PAT, five ecosystems.
Harness CI/CD pipelines are built as governed execution pipelines — where what runs is controlled through customer-owned infrastructure, policy gates, scoped credentials, immutable references, and explicit trust boundaries. At its core is the Delegate — a lightweight worker process that runs inside your infrastructure (your VPC, your Kubernetes cluster), executes tasks locally, and communicates with the Harness control plane via outbound-only connections.
When we designed this architecture, we assumed the execution plane would become the primary target in the enterprise. If TeamPCP tried to attack a Harness-powered environment, they would hit three architectural walls.
The Architecture.
The Delegate lives inside your VPC or cluster. It communicates with our SaaS control plane via outbound-only HTTPS/WSS. No inbound ports are opened.
The Defense.
You control the firewall. Allowlist app.harness.io and the specific endpoints your pipelines need, deny everything else. TeamPCP's exfiltration to typosquat domains would fail at the network layer — not because of a detection rule, but because the path doesn't exist. Both typosquat domains returned clean verdicts from threat intel feeds. Egress filtering by allowlist is more reliable than detection by reputation.
The Architecture.
Rather than bulk-injecting secrets as flat environment variables at job start, Harness can resolve secrets at runtime through your secret manager — HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager, Azure Key Vault — via the Delegate, inside your network. Harness SaaS stores encrypted references and metadata, not plaintext secret values.
The Defense.
TeamPCP's Cloud Stealer worked because in an open execution pipeline, secrets are typically injected into the runner's process memory where they remain accessible for the job's duration. In a governed execution pipeline, this exposure is structurally reduced: secrets can be resolved from your controlled vault at the point they're needed, rather than broadcast as environment variables to every step in the pipeline.
An important caveat: Vault-based resolution alone doesn't eliminate runtime exfiltration. Once a secret is resolved and passed to a step that legitimately needs it — say, an npm token during npm publish — that secret exists in the step's runtime. If malicious code is executing in that same context (for example, a tampered package.json that exfiltrates credentials during npm run test), the secret is exposed regardless of where it came from. This is why the three walls work as a system: Wall 2 reduces the surface of secret exposure, Wall 1 blocks the exfiltration path, and (as we'll see) Wall 3 limits the blast radius to the scoped environment. No single wall is sufficient on its own.
To further strengthen how pipelines use secrets, leverage ephemeral credentials — AWS STS temporary tokens, Vault dynamic secrets, or GCP short-lived service account tokens — that auto-expire after a defined window, often minutes. Even if TeamPCP’s memory scraper extracted an ephemeral credential, it likely would have expired before the attacker could pivot to the next target.
The Architecture.
Harness supports environment-scoped delegates as a core architecture pattern. Your "Dev" scanner delegate runs in a different cluster, with different network boundaries and different credentials, than your "Prod" deployment delegate.
The Defense.
The credential cascade that defined TeamPCP hits a dead end. Stolen Dev credentials cannot reach Production publishing gates or AI API keys, because those credentials live in a different vault, resolved by a different delegate, in a different network segment. If the Trivy compromise only yielded credentials scoped to a dev environment, the attack stops at phase one.
Beyond the walls, governed execution pipelines provide additional structural controls:
Architecture is a foundation, not a guarantee. Governed execution pipelines are materially safer against this class of attack, but you can still create avoidable risk by running unvetted containers on delegates, skipping egress filtering, using the same delegate across dev and prod, granting overly broad cloud access, or exposing excessive secrets to jobs that don't need them, or using long-lived static credentials when ephemeral alternatives exist.
I am not claiming that Harness is safe and GitHub Actions is unsafe. That would be too simplistic.
What I am claiming is that governed execution pipelines — where what runs is controlled through policy gates, customer-owned infrastructure, scoped credentials, and immutable references — are a materially safer foundation than open execution pipelines. We designed Harness as our implementation of a governed execution pipeline. But architecture is a starting point — you still have to operate it well.
As we enter the era of Agentic AI — where AI is generating pipelines, suggesting dependencies, and submitting pull requests at machine speed — we can no longer rely on human review to catch a malicious tag in an AI-generated PR.
But there's a more fundamental shift: AI agents will become the primary actors inside CI/CD pipelines. Not just generating code — autonomously executing tasks, selecting dependencies, making deployment decisions, remediating incidents.
Now imagine an AI agent in an open execution pipeline — downloaded from a public marketplace, referenced by a mutable tag, executing with full privileges, making dynamic runtime decisions you didn't define. It has access to your secrets, your cloud credentials, and your deployment infrastructure. Unlike a static script, an agent makes decisions at runtime — fetching resources, calling APIs, modifying files.
If TeamPCP showed us what happens when a static scanner is compromised, imagine what happens when an autonomous AI agent is compromised — or simply makes a decision you didn't anticipate.
This is why governed execution pipelines aren't just a security improvement — they're an architectural prerequisite for the AI era. In a governed pipeline, even an AI agent operates within structural boundaries: it runs on infrastructure you control, accesses only scoped secrets, has restricted egress, and its actions are audited. The agent may be autonomous, but the pipeline constrains what it can reach.
The questions every engineering leader should be asking:
If you use Trivy, Checkmarx, or LiteLLM:
If you use GitHub Actions:
For the longer term:
I'm writing this as the CEO of a company that competes with GitHub in the CI/CD space. I want to be transparent about that.
But I'm also writing this as someone who has spent two decades building infrastructure software and who saw this threat model coming. When we designed Harness, the open execution pipeline model had already evolved from Jenkins plugins to GitHub Actions — each generation making it easier for third-party code to run with full privileges and, by moving execution further from the customer's network perimeter, making exfiltration easier. We deliberately chose to build governed execution pipelines instead.
The TeamPCP campaign didn't teach us anything new about the risk. What it did was make the difference between open and governed execution impossible for the rest of the industry to ignore.
Open source security tools are invaluable. The developers and companies who build them — including Aqua Security and Checkmarx — are doing essential work. The problem isn't the tools. The problem is running them inside open execution pipelines where third-party code has full privileges, secrets sit in memory, and exfiltration faces no structural barrier.
If you want to explore how the delegate architecture works in practice, we're here to show you. But more importantly, regardless of what platform you choose, please take these structural questions seriously. The next TeamPCP is already studying the credential graph.

Over the last few years, something fundamental has changed in software development.
If the early 2020s were about adopting AI coding assistants, the next phase is about what happens after those tools accelerate development. Teams are producing code faster than ever. But what I’m hearing from engineering leaders is a different question:
What’s going to break next?
That question is exactly what led us to commission our latest research, State of DevOps Modernization 2026. The results reveal a pattern that many practitioners already sense intuitively: faster code generation is exposing weaknesses across the rest of the software delivery lifecycle.
In other words, AI is multiplying development velocity, but it’s also revealing the limits of the systems we built to ship that code safely.
One of the most striking findings in the research is something we’ve started calling the AI Velocity Paradox - a term we coined in our 2025 State of Software Engineering Report.
Teams using AI coding tools most heavily are shipping code significantly faster. In fact, 45% of developers who use AI coding tools multiple times per day deploy to production daily or faster, compared to 32% of daily users and just 15% of weekly users.
At first glance, that sounds like a huge success story. Faster iteration cycles are exactly what modern software teams want.
But the data tells a more complicated story.
Among those same heavy AI users:
What this tells me is simple: AI is speeding up the front of the delivery pipeline, but the rest of the system isn’t scaling with it. It’s like we are running trains faster than the tracks they are built for. Friction builds, the ride is bumpy, and it seems we could be on the edge of disaster.

The result is friction downstream, more incidents, more manual work, and more operational stress on engineering teams.
To understand why this is happening, you have to step back and look at how most DevOps systems actually evolved.
Over the past 15 years, delivery pipelines have grown incrementally. Teams added tools to solve specific problems: CI servers, artifact repositories, security scanners, deployment automation, and feature management. Each step made sense at the time.
But the overall system was rarely designed as a coherent whole.
In many organizations today, quality gates, verification steps, and incident recovery still rely heavily on human coordination and manual work. In fact, 77% say teams often have to wait on other teams for routine delivery tasks.
That model worked when release cycles were slower.
It doesn’t work as well when AI dramatically increases the number of code changes moving through the system.
Think of it this way: If AI doubles the number of changes engineers can produce, your pipelines must either:
Otherwise, the system begins to crack under pressure. The burden often falls directly on developers to help deploy services safely, certify compliance checks, and keep rollouts continuously progressing. When failures happen, they have to jump in and remediate at whatever hour.
These manual tasks, naturally, inhibit innovation and cause developer burnout. That’s exactly what the research shows.
Across respondents, developers report spending roughly 36% of their time on repetitive manual tasks like chasing approvals, rerunning failed jobs, or copy-pasting configuration.
As delivery speed increases, the operational load increases. That burden often falls directly on developers.
The good news is that this problem isn’t mysterious. It’s a systems problem. And systems problems can be solved.
From our experience working with engineering organizations, we've identified a few principles that consistently help teams scale AI-driven development safely.
When every team builds pipelines differently, scaling delivery becomes difficult.
Standardized templates (or “golden paths”) make it easier to deploy services safely and consistently. They also dramatically reduce the cognitive load for developers.
Speed only works when feedback is fast.
Automating security, compliance, and quality checks earlier in the lifecycle ensures problems are caught before they reach production. That keeps pipelines moving without sacrificing safety.
Feature flags, automated rollbacks, and progressive rollouts allow teams to decouple deployment from release. That flexibility reduces the blast radius of new changes and makes experimentation safer.
It also allows teams to move faster without increasing production risk.
Automation alone doesn’t solve the problem. What matters is creating a feedback loop: deploy → observe → measure → iterate.
When teams can measure the real-world impact of changes, they can learn faster and improve continuously.
AI is already changing how software gets written. The next challenge is changing how software gets delivered.
Coding assistants have increased development teams' capacity to innovate. But to capture the full benefit, the delivery systems behind them must evolve as well.
The organizations that succeed in this new environment will be the ones that treat software delivery as a coherent system, not just a collection of tools.
Because the real goal isn’t just writing code faster. It’s learning faster, delivering safer, and turning engineering velocity into better outcomes for the business.
And that requires modernizing the entire pipeline, not just the part where code is written.
.jpg)
.jpg)
If you’ve ever pushed a feature branch that quietly triggered multiple production deployments—and only realized the impact when the AWS bill jumped 240% month-over-month—you already understand the problem.
Cost awareness in CI/CD pipelines isn’t about slowing teams down. It’s about avoiding financial surprises that lead to tense finance meetings and urgent cost-cutting exercises.
Many platform engineering teams treat cloud spend as something to review after deployment. Code ships. Infrastructure scales. Services consume resources. Weeks later, someone from finance posts a Slack screenshot of a sharply rising spend graph. By that point, the expensive workload has been running for days or weeks. Rolling it back risks disruption. And the engineers who shipped it are already focused on the next sprint.
That reactive model might work when cloud usage is stable and margins are wide. It breaks down quickly as deployment velocity increases and systems become more complex. Without pipeline cost visibility built into your workflows, teams optimize purely for speed—without seeing the financial impact of each merge.
Traditional pipelines are designed for one purpose: delivering code to production quickly and reliably. Teams track build duration, deployment success rates, and test coverage. But cloud cost governance pipelines? That usually sits outside the CI/CD system.
This creates a structural gap.
Engineers making deployment decisions rarely see the cost implications of those decisions in real time. You can monitor CPU usage, memory, and latency—but not that your new microservice is quietly generating $400 per day in cross-region data transfer charges.
That disconnect leads to friction:
By the time a cost spike is discovered, the context behind the deployment is often lost. The feedback loop is simply too slow.
At enterprise scale, small inefficiencies compound. One team’s cost regression might be negligible. Ten teams introducing cost-heavy services every week becomes a serious budget issue. Without infrastructure cost tracking at the pipeline level, you can’t clearly attribute increases to specific deployments or commits. You see total spend rising—but not what caused it.
The goal isn’t to introduce manual approvals or slow down delivery. The goal is to make cost data visible early enough for teams to make smarter decisions before code hits production.
One of the most effective ways to enable CI/CD cost optimization is by integrating automated cost feedback loops directly into pipeline stages.
Before a deployment completes, your system should estimate the incremental cost impact and surface it alongside build and test results.
For example:
The estimates don’t need to be perfect. Directional accuracy is enough to catch major regressions. If a deployment is projected to increase monthly spend by 30%, that’s a signal to pause and evaluate. If the cost delta is minimal, the pipeline proceeds normally.
This approach enables build pipeline cost control without adding unnecessary friction.
Once cost data flows through your pipelines, the next step is establishing budget guardrails.
Pipeline cost visibility allows you to define thresholds—for example, triggering a review if service-level spend increases by more than 20%. This doesn’t block innovation; it simply ensures cost increases are intentional.
With this model:
Infrastructure cost tracking at the pipeline level also improves attribution. Instead of reviewing spend by account or department, you can tie cost increases directly to individual pipeline runs and commits. That clarity makes DevOps cost management far more actionable.
True FinOps CI/CD integration means shifting cost ownership closer to the engineers making infrastructure decisions.
Cost becomes a first-class operational metric—alongside performance, reliability, and security.
When cost data lives in the same interface as builds and deployments, teams naturally factor it into trade-offs. You reduce the need for reactive enforcement because engineers can see and adjust in real time.
This alignment benefits everyone:
Cloud cost governance pipelines work best when they support engineering velocity—not compete with it.
Harness Cloud Cost Management is designed to connect DevOps execution with financial accountability.
Unlike traditional tools that focus on billing-level reporting, Harness embeds pipeline cost visibility directly into CI/CD workflows. Engineers receive real-time cost feedback in the same system where they manage builds and deployments.
Key capabilities include:
If a deployment exceeds predefined thresholds, the pipeline can automatically flag it or enforce policy-based controls—supporting consistent build pipeline cost control across teams.
By connecting cost allocation directly to services, teams, and pipeline runs, organizations gain granular insight into what drives spend. Conversations between finance and engineering become fact-based and collaborative rather than reactive.
For teams already using Harness CI/CD, adding cost awareness becomes a natural extension of existing workflows—no context switching required.
Learn more about Harness Cloud Cost Management and explore the cost visibility and governance capabilities available in the platform.
Cloud cost management at scale can’t rely on monthly budget reviews or occasional optimization sprints. It has to be embedded where cost decisions actually happen: inside your CI/CD pipelines.
When engineers see cost impact in real time, they make smarter trade-offs.
When platform teams enforce guardrails programmatically, they prevent regressions early.
When finance has attribution tied to specific deployments, discussions become clearer and more productive.
Cost awareness in CI/CD pipelines isn’t friction—it’s context.
The teams that succeed with CI/CD cost optimization don’t treat cost as a constraint. They treat it as an operational signal that improves engineering decisions.
If your organization has struggled with unexpected cloud spend or unclear attribution, it may be time to rethink where cost visibility lives. Embedding DevOps cost management directly into your CI/CD workflows gives you the speed of modern delivery—without sacrificing financial control.


In Part 1, we argued that most dev teams start in the wrong place. They obsess over prompts, when the real problem is structural: agents are dropped into repositories that were never designed for them. The solution was to make the repository itself agent-native through a standardized instruction layer like AGENTS.md.
But even after you fix the environment, something still breaks.
The agent starts strong. It understands the problem, follows instructions, seems super intelligent
Then, somewhere along the way, things begin to drift. The code still compiles, but the logic gets inconsistent. Small mistakes creep in. Constraints are ignored. Assumptions mutate.
Nothing fails loudly. Everything just gets slightly worse.
This is the second failure mode of AI systems: context rot.
There is a persistent assumption in the industry that more context leads to better performance. If a model can handle large context windows, then giving it more information should improve accuracy.
In practice, the opposite is often true.
Recent research from Chroma shows that LLM performance degrades as input length increases, even when the model is operating well within its maximum context window. Similar observations are echoed across independent analyses, including this breakdown of why models deteriorate in longer sessions and practical explorations of how context mismanagement impacts production systems.
This is not an edge case. It is a structural limitation.
Models do not “understand” context in a hierarchical way. They distribute attention across tokens. As context grows, signal competes with noise. Important instructions lose weight. Irrelevant details gain influence. Conflicts accumulate.
What looks like a reasoning failure is often just context degradation.
If you’ve worked with AI coding agents for more than a few hours, you’ve already seen this pattern.
A session starts with clear instructions and aligned reasoning. Over time, it fills with partial implementations, outdated assumptions, repeated instructions, and exploratory dead-ends. The model doesn’t forget earlier information, it simply cannot prioritize it effectively anymore.
Detailed guides on context management highlight this exact failure mode: as sessions grow, models become increasingly sensitive to irrelevant or redundant tokens, which degrade output quality. Platform-level documentation also reinforces the same principle - effective systems explicitly control how context is introduced, retained, and pruned.
In practice, this shows up as inconsistency. But underneath, it’s the predictable outcome of unmanaged context growth.
This is where teams often misdiagnose the issue.
When agents hallucinate, the instinct is to blame the model. But hallucination is often downstream of context rot.
OpenAI’s work on hallucinations explains that models are optimized to produce plausible outputs even under uncertainty. When context degrades, uncertainty increases. The model fills gaps with statistically likely answers.
So the failure chain looks like this:
Context degradation → ambiguity → confident guessing → hallucination
In other words, hallucination is not always a knowledge problem.
It is often a context management problem.
Most developers interact with AI through chat, so they treat sessions like conversations.
That mental model breaks at scale.
A long-running AI session is not a conversation. It is a stateful system.
And like any stateful system, it degrades without control.
Letting context accumulate indefinitely is equivalent to running a system without memory management. Eventually, performance collapses—not because the system is incapable, but because it is overloaded.
Once you accept that context degrades, the solution becomes straightforward: you don’t try to out-prompt the problem. You control how context evolves.
Across production teams, a consistent pattern emerges:
Plan → Execute → Reset
This is not a trick. It is operational discipline.
The most common mistake is asking the agent to write code immediately. This forces premature decisions and locks the model into an approach before it has fully understood the problem.
Instead, enforce a planning phase.
Have the model break down the task, identify dependencies, and surface uncertainties before implementation. This aligns closely with best practices in production-grade prompt engineering, where structured reasoning is prioritized over immediate generation.
Planning reduces unnecessary context growth and prevents incorrect assumptions from propagating.
Once the plan is validated, execution should be incremental.
Large, monolithic prompts create large, monolithic contexts—and those degrade fastest.
Stepwise execution keeps the working context focused. Each step introduces only the information required for that step. Errors are caught early, before they spread across the system.
This is not about slowing down development. It is about maintaining signal integrity.
Even with disciplined execution, context will eventually degrade.
The only reliable solution is to reset.
This may feel inefficient, but in practice, it is one of the highest-leverage actions you can take. A fresh session restores clarity, removes noise, and re-establishes correct prioritization of instructions.
Modern context management approaches consistently emphasize this: keep context bounded, and reintroduce only what is necessary for the task at hand.
One of the most effective techniques for preventing context rot is meta-prompting.
Instead of telling the model what to do, you tell it how to approach the task.
You explicitly require it to:
This interrupts the model’s default behavior of immediate generation.
Why does this work?
Because hallucinations are often driven by premature certainty. Meta-prompting introduces friction at exactly the right point—before incorrect assumptions become embedded in the context.
Context rot is dangerous because it is gradual and often invisible.
Checkpoints make it observable.
At key moments, you force the model to validate its output against:
This transforms hidden drift into explicit feedback.
Instead of discovering problems at the end, you correct them continuously.
Part 1 of the series solved the problem of what the agent sees.
Part 2 addresses what happens over time.
AGENTS.md provides structure.
Session discipline preserves that structure.
Without AGENTS.md, the agent guesses.
Without discipline, the agent drifts.
You need both to achieve reliable outcomes.
As teams move from experimentation to production, sessions become longer and more complex. Agents interact with more systems, touch more code, and accumulate more context.
This is where most failures emerge.
Not because the model is incapable, but because the workflow is uncontrolled.
Context rot is the primary bottleneck in real-world AI engineering today.
In Part 3, we turn to a different problem.
So far, we have focused on a single agent operating within a controlled session. That constraint makes it possible to reason about context, to reset it, and to keep it aligned with the task.
But most real systems do not stay within that boundary.
As soon as you introduce multiple agents, external tools, or retrieval systems, the problem changes. Context is no longer contained in a single session. It becomes distributed across components that do not share the same state or assumptions.
At that point, failures become harder to trace. Drift is no longer local. It propagates.
This is where orchestration becomes necessary, but also where it becomes risky.
Part 3 explores how to build these systems in a way that preserves the guarantees established here. We will look at how to introduce MCPs, subagents, and external integrations without losing control over context, consistency, or behavior.


Testing database changes against production-like data removes risk from your delivery process but to be effective, it must be orchestrated, governed, and automated. Manual scripts and ad-hoc checks lack the repeatability and auditability required for modern delivery practices.
Harness Database DevOps provides a framework to embed production data testing into your CI/CD pipelines, enabling you to manage database schema changes with the same rigor as application code. Harness DB DevOps is designed to bridge development, operations, and database teams by bringing visibility, governance, and standardized execution to database changes.
Instead of treating testing with production data as an afterthought, you can define it as a pipeline stage that executes reliably across environments.
To incorporate production data testing into your delivery process, you define a Harness Database DevOps pipeline with structured, repeatable steps. The result is a governed testing model that captures evidence of correctness before any change ever reaches production.
In Harness Database DevOps, you begin by configuring the necessary database instances and schemas:
For production data testing, you provision two isolated instances seeded with a snapshot of production data (secured and masked as needed). These instances are not customer-facing; they serve as ephemeral test targets.
This structure sets up identical baselines for controlled experimentation.
Harness Database DevOps lets you define a deployment pipeline that incorporates database and application changes in the same workflow:
Using Liquibase or Flyway via Harness, the pipeline applies schema changes to Instance A while Instance B remains the baseline.
This step executes the migration in a real, production-scale context, capturing performance, constraint behaviors, and other runtime characteristics.
A powerful capability of Harness Database DevOps is automated rollback testing within the pipeline:
Testing rollback paths removes the assumption that reversal will work in production, a key risk often untested in traditional workflows.
After rollback, you compare Instance A (post-rollback) with Instance B (untouched):
If disparities are detected, the pipeline can fail early, prompting review and remediation before production deployment.
This approach builds evidence rather than assumptions about the quality and safety of database changes.
The updated workflow aligns with the documented capabilities of Harness Database DevOps:
Importantly, the workflow does not assume native data cloning features within Harness itself. Instead, it positions data-centric operations (cloning and validation) as composable steps in a broader automation pipeline.
Embedding production data testing inside Harness Database DevOps pipelines delivers measurable outcomes:
This integrated, pipeline-oriented approach elevates database change management into a disciplined engineering practice rather than a set of isolated tasks.
Database changes do not fail because teams lack skill or intent. They fail because uncertainty is tolerated too late in the delivery cycle when production data, scale, and history finally collide with untested assumptions.
Testing with production data, when executed responsibly, shifts database delivery from hope-based validation to evidence-based confidence. It allows teams to validate not just that a migration applies, but that it performs, rolls back cleanly, and leaves no hidden drift behind. That distinction is the difference between routine releases and high-severity incidents.
By operationalizing this workflow through Harness Database DevOps, organizations gain a governed, repeatable way to:
This is not about adding more processes. It is about removing uncertainty from the most irreversible layer of your system.
Explore a Harness Database DevOps to see how production-grade database testing, rollback validation, and governed pipelines can fit seamlessly into your existing workflows The fastest teams don’t just deploy quickly, they deploy with confidence.


Shift-Left FinOps reframes cloud cost optimization as an engineering responsibility rather than a retrospective financial review.
Instead of analyzing cloud spend after infrastructure has already been deployed, organizations integrate FinOps automation and cost governance directly into development workflows.
Developers receive immediate feedback about the financial impact of infrastructure changes during development, not weeks later during billing reconciliation.
This approach aligns FinOps best practices with modern platform engineering workflows built around:
The result is proactive cost management, where waste is prevented before resources ever reach production.
Most organizations still treat FinOps as a retrospective discipline.
Finance teams review monthly cloud bills, identify anomalies, and ask engineering teams to investigate cost spikes.
This approach worked when infrastructure provisioning happened slowly through manual processes.
Modern cloud environments operate differently.
Teams now deploy infrastructure changes dozens or even hundreds of times per day through automated pipelines.
By the time billing data arrives, the operational context behind those provisioning decisions has already disappeared.
This delay introduces several operational challenges.
Without consistent tagging and governance policies, organizations struggle to determine:
Missing metadata makes cloud financial management and cost attribution significantly harder.
When cost issues are discovered weeks later, teams must retrofit governance controls onto already-running infrastructure.
This reactive approach introduces operational risk and disrupts delivery schedules.
Instead of preventing waste, teams are forced into post-deployment cloud cost optimization efforts.
Developers often make infrastructure decisions without understanding their financial implications.
Without real-time cost visibility, engineers cannot evaluate tradeoffs between:
Shift-Left FinOps addresses these problems by embedding Infrastructure as Code cost control and governance earlier in the development lifecycle.
Implementing Shift-Left FinOps requires three foundational capabilities:
Together, these capabilities enable FinOps automation and proactive cost management across cloud environments.
Policy as Code frameworks translate financial governance requirements into enforceable rules.
These rules automatically evaluate infrastructure definitions before resources are deployed.
Instead of relying on documentation or manual enforcement, policy as code provides automated cloud cost governance directly within engineering workflows.
Effective cost governance policies typically address three categories of waste.
Policies ensure every resource includes metadata required for cost attribution.
Examples include:
Governance rules prevent developers from provisioning oversized instances for workloads that do not require maximum capacity.
This supports long-term cloud cost optimization and prevents overprovisioning.
Policies identify resources missing automated shutdown schedules, retention rules, or lifecycle controls.
These controls prevent idle resources from generating unnecessary costs.
Example conceptual tagging policy:

With Policy as Code enforcement, developers receive immediate feedback during infrastructure validation rather than after deployment.
Infrastructure as Code provides the technical foundation for shift-left cloud cost governance.
IaC allows infrastructure changes to be:
before deployment.
These characteristics create natural enforcement points for Infrastructure as Code cost control policies.
Developers run policy checks locally before committing infrastructure changes.
This prevents cost policy violations from entering shared repositories.
Early validation enables proactive cost management during development.
Pull request pipelines automatically validate infrastructure definitions against governance policies.
If cost policies fail, the merge is blocked.
Example validation workflow:

This ensures consistent cloud cost governance across all infrastructure deployments.
Production deployment pipelines include a final validation step before infrastructure changes are applied.
This layered validation model creates defense in depth for FinOps automation:
Shift-Left FinOps only works when developers have access to cost insights during infrastructure planning.
Without cost visibility, governance policies feel arbitrary and difficult to follow.
Cost estimation should occur during infrastructure planning stages, not after deployment.
When developers run terraform plan, they should also see estimated monthly costs associated with proposed infrastructure changes.
This allows developers to evaluate architectural tradeoffs such as:
Integrating cost feedback into existing workflows improves adoption.
Examples include:
These feedback loops support FinOps best practices by making cost awareness part of everyday development work.
Organizations implementing shift-left cloud cost governance frequently encounter predictable challenges.
Understanding these patterns helps teams implement FinOps automation successfully.
Governance policies that generate frequent false positives slow developer velocity.
Developers may attempt to bypass governance controls.
Policies should focus on real sources of cloud cost waste.
Shift-left cost governance fails when platform teams must manually review every infrastructure change.
All governance rules should be automated through Policy as Code and CI/CD validation pipelines.
Providing policies without cost visibility creates confusion.
Developers need both:
Successful governance systems integrate into existing workflows such as:
The most effective governance systems remain invisible until violations occur.
Harness Cloud Cost Management enables organizations to implement Shift-Left FinOps and proactive cost management at scale.
The platform integrates FinOps automation and cost governance directly into platform engineering workflows.
Harness CCM connects cloud environments across:
This provides unified cost visibility across multi-cloud infrastructure.
Real-time cost allocation allows developers to see cost breakdowns during infrastructure provisioning rather than waiting for billing cycles.
Teams can enforce governance policies such as:
These policies automatically validate infrastructure changes during development and CI/CD pipelines.
Harness CCM also supports cloud cost optimization through automated insights and recommendations, including:
These capabilities help organizations implement modern cloud financial management practices while maintaining developer velocity.
Shift-Left FinOps prevents waste before resources are deployed by embedding cost governance directly into development workflows.
This proactive model improves cloud cost optimization and cloud financial management outcomes.
Typical implementations include:
These technologies enable automated cloud cost governance through policy as code.
No. When implemented correctly, automated cost policies provide immediate feedback without manual approvals.
This improves delivery speed while maintaining cost control and FinOps automation.
Organizations define standardized governance policies that apply across AWS, Azure, and GCP.
These policies focus on universal patterns such as:
This approach supports consistent multi-cloud cost governance and financial management.
Shift-Left FinOps transforms cloud cost optimization from a reactive financial process into a proactive engineering practice.
By embedding Policy as Code governance, Infrastructure as Code cost control, and automated FinOps workflows into development pipelines, organizations prevent waste before infrastructure reaches production.
The result is stronger cloud cost governance, improved cloud financial management, and more efficient cloud operations.
Developers gain real-time cost insights.
Platform teams enforce governance automatically.
Finance teams gain accurate cost attribution across environments.
At the scale of modern cloud infrastructure, proactive cost management is essential for sustainable cloud growth.


Database systems store some of the most sensitive data of an organization such as PII, financial records, and intellectual property, making strong database governance non-negotiable. As regulations tighten and audit expectations increase, teams need governance that scales without slowing delivery.
Harness Database DevOps addresses this by applying policy-driven governance using Open Policy Agent (OPA). With OPA policies embedded directly into database pipelines, teams can automatically enforce rules, capture audit trails, and stay aligned with compliance requirements. This blog outlines how to use OPA in Harness to turn database compliance from a manual checkpoint into a built-in, scalable part of your DevOps workflow.
Organizations face multiple challenges when navigating database compliance:
These challenges highlight the necessity of embedding governance directly into database development and deployment pipelines, rather than treating compliance as a reactive checklist.
Harness Database DevOps is designed to offer a comprehensive solution to database governance - one that aligns automation with compliance needs. It enables teams to adopt policy-driven controls on database change workflows by integrating the Open Policy Agent (OPA) engine into the core of database DevOps practices.
What is OPA and Policy as Code?
OPA is an open source, general-purpose policy engine that decouples policy decisions from enforcement logic, enabling centralized governance across infrastructures and workflows. Policies in OPA are written in the Rego declarative language, allowing precise expression of rules governing actions, access, and configurations.
Harness implements Policy as Code through OPA, enabling teams to store, test, and enforce governance rules directly within the database DevOps lifecycle. This model ensures that compliance controls are consistent, auditable, and automatically evaluated before changes reach production.
Here’s a structured approach to implementing database governance with OPA in Harness:
Start by cataloging your regulatory obligations and internal governance policies. Examples include:
Translate these requirements into quantifiable rules that can be expressed in Rego.
Within the Harness Policy Editor, define OPA policies that codify governance rules. For example, a policy might block any migrations containing operations that remove columns in production environments without explicit DBA approval.
Harness policies are modular and reusable, you can import and extend them as part of broader governance packages. This allows cross-team reuse and centralized management of rules. Key aspects include:
By expressing governance as code, you ensure consistency and remove ambiguity in policy enforcement.
Policies can be linked to specific triggers within your database deployment workflow, for instance, evaluating rules before a migration is applied or before a pipeline advances to production. This integration ensures that non-compliant changes are automatically blocked, while compliant changes proceed seamlessly, maintaining the balance between speed and control.
Harness evaluates OPA policies at defined decision points in your pipeline, such as pre-deployment checks. This prevents risky actions, enforces access controls, and aligns every deployment with governance objectives without manual intervention.
Audit Trails and Traceability
Every policy evaluation is logged, creating an auditable trail of who changed what, when, and why. These logs serve as critical evidence during compliance audits or internal reviews, reducing the overhead and risk associated with traditional documentation practices.
By enforcing the principle of least privilege, policies ensure that users and applications possess only the necessary permissions for their specific roles. This restriction on access is crucial for minimizing the potential attack surface and maintaining compliance with regulatory requirements for data access governance.
Database governance is an essential pillar of enterprise compliance strategies. By embedding OPA-based policy enforcement within Harness Database DevOps, organizations can automate compliance controls, minimize risk, and maintain developer productivity. Policy as Code provides a scalable, auditable, and consistent framework that aligns with both regulatory obligations and the need for agile delivery.
Transforming database governance from a manual compliance burden into an automated, integrated practice empowers teams to innovate securely, confidently, and at scale - ensuring that every change respects the policies that protect your data, your customers, and your brand.


Most organizations begin their software supply chain journey the same way: they implement Software Composition Analysis (SCA) to manage open source risk. They scan for vulnerabilities, generate SBOMs, and remediate CVEs. For many teams, that feels like progress—and it is.
But it is not enough.
As discussed in the webinar conversation with Information Security Media Group, open-source visibility is necessary, but it is not sufficient. Modern applications are no longer just collections of third-party libraries. They are built, packaged, signed, stored, promoted, deployed, and increasingly augmented by AI systems. Each stage introduces new dependencies and new trust boundaries.
Events like Log4j made open source risk impossible to ignore. However, the evolution of the threat landscape has demonstrated that attackers are no longer limited to exploiting known vulnerabilities in libraries. They are targeting the mechanics of delivery itself—ingestion, build pipelines, artifact storage, and CI/CD automation. Organizations that stop at SCA are securing one layer of a much broader system.
Artifacts are the final outputs of the build process—container images, binaries, scripts, Helm charts, JAR files. They are what actually run in production. Yet many organizations treat artifact management as operational plumbing rather than a security control point.
The webinar highlighted how focusing exclusively on source code can obscure the reality that artifacts may be tampered with after build, altered during promotion, or stored in misconfigured registries. Visibility into open source dependencies does not automatically guarantee artifact integrity. An attacker who compromises a registry or intercepts promotion workflows can distribute malicious artifacts at scale.
The key risk lies in assuming that once an artifact is built, it is trustworthy. Without signing, provenance tracking, and gating at the registry level, artifacts become one of the most exploitable surfaces in the supply chain.
CI/CD systems hold credentials, secrets, deployment paths, and signing keys. They connect development directly to production. As one of the speakers noted during the discussion, pipelines should be treated as privileged infrastructure and assumed to be potential targets of compromise.
A compromised runner can publish malicious artifacts, exfiltrate secrets, or promote unauthorized builds. This is not theoretical. Attacks involving poisoned GitHub Actions and manipulated build systems demonstrate how easily the pipeline itself can become the distribution mechanism.
Security must therefore extend beyond scanning artifacts to enforcing strict governance within pipelines. This includes least privilege access, ephemeral credentials, audit trails, and Policy as Code enforcement to ensure required security checks cannot be bypassed.
Container ecosystems introduce additional risk vectors. Malicious images uploaded to public registries, typosquatting packages, and compromised upstream components can all infiltrate environments through seemingly legitimate pulls.
Organizations that implicitly trust external registries transfer vendor risk into their own infrastructure. Without upstream proxy controls, cool-off periods, or quarantine mechanisms, container ingestion becomes another blind spot.
The supply chain does not stop at internal code repositories. It extends to every external source that feeds into the build.
Modern delivery pipelines integrate numerous third-party services. Vendors often have privileged access to environments or automated integrations into CI/CD workflows. If a vendor is compromised, that risk propagates downstream.
The webinar discussion emphasized that the supply chain must be viewed as a “trust fabric.” Pipelines, registries, and vendors are all part of that fabric. A weakness in any one node can cascade across the system.
Build systems represent one of the most underestimated attack surfaces in modern software delivery. A source tree may pass review and scanning, yet small modifications in the build process can fundamentally alter the resulting artifact.
Examples discussed during the session included pre-install hooks, registry overrides, runtime installers, or seemingly minor shell script changes that introduce malicious behavior before artifacts are signed or scanned. These changes can bypass traditional SCA tools because the underlying source code appears clean.
This is why build integrity must be verifiable. Provenance should be recorded and tied to specific systems and identities. Build steps should be signed and attested. Promotion gates should require verification of those attestations before artifacts move forward.
Trust must be anchored in the build output, not assumed from the source input.
CI/CD pipelines are often viewed as automation tools, but they are in fact highly privileged systems that bridge development and production. They hold secrets, manage deployment logic, and often operate with broad permissions across infrastructure.
The webinar discussion stressed that pipelines must be treated as untrusted environments by default. This does not imply mistrust of developers, but rather recognition that any high-privilege system is an attractive target.
Policy-as-code frameworks, strict RBAC, auditability, and enforcement of mandatory security checks ensure that controls cannot be disabled under pressure to ship. Developers may unintentionally bypass safeguards when under deadlines. Governance mechanisms must therefore be systemic, not optional.
In complex environments where multiple tools—GitHub Actions, Jenkins, Docker, Kubernetes—are integrated together, misconfiguration becomes another source of risk. Each tool has its own security model. Without centralized governance, complexity compounds vulnerability.
As if artifacts, pipelines, and containers were not enough, AI-native applications are adding an entirely new dimension to supply chain security.
Modern applications increasingly rely on Large Language Models, prompt libraries, embeddings, model weights, and training datasets. These components influence runtime behavior in ways that traditional code does not. Yet they are rarely tracked or governed with the same rigor as open-source dependencies.
The concept of an “AI Bill of Materials” is emerging, but no standardized framework currently exists. Organizations are integrating AI features faster than governance standards can keep pace.
The risks differ from traditional CVEs. Poisoned training data can subtly manipulate model behavior. Backdoored model weights can introduce hidden functionality. Prompt injection attacks can trick systems into exposing sensitive information. Shadow AI systems may be deployed without formal oversight.
Unlike deterministic software, AI systems produce probabilistic outputs. Static security testing does not fully address this unpredictability. Security teams must now consider model provenance, data lineage, vendor trust, and runtime behavior monitoring as part of the supply chain equation.
Even the build-versus-buy decision for LLMs becomes a supply chain governance choice. Building offers control but introduces operational burden and long-term responsibility. Buying accelerates deployment but increases trust dependency on external vendors. In both cases, AI components extend the trust fabric and must be governed accordingly.
Moving beyond SCA requires structured controls across the full lifecycle of delivery.
Ingest-Time Controls ensure that risky packages and images are prevented from entering developer workflows in the first place through dependency firewalls, upstream proxy governance, and vendor controls.
Build-Time Integrity requires signed environments, provenance attestations, and enforcement of SLSA-style compliance so that artifacts can be cryptographically tied to verified build processes.
Promotion-Time Governance introduces artifact registry gating, quarantine workflows, and policy enforcement to prevent unauthorized or tampered artifacts from advancing to production.
Runtime Verification ensures continuous monitoring of deployment health, secret usage, and, increasingly, AI behavior to detect anomalous activity after release.
This layered approach transforms supply chain security from a reactive scanning function into an operational control system embedded directly into software delivery workflows.
Software supply chain security has evolved.
It is no longer an open-source vulnerability problem alone. It is a trust management challenge spanning artifacts, pipelines, containers, vendors, and AI components.
Organizations that succeed will not stop at generating reports. They will enforce policy at every stage. They will treat CI/CD as privileged infrastructure. They will require attestations before promotion. They will govern ingestion and monitor runtime behavior. They will extend security controls into AI-native systems.
Supply chain security must move beyond visibility. It must deliver enforceable control across the entire software delivery lifecycle. Software supply chain security isn’t about scanning more. It’s about governing every stage of software delivery from code to artifact to pipeline to production to AI.
Ready to see how Harness embeds supply chain security directly into CI/CD, artifact governance, and AI-powered verification?
Explore Harness Software Supply Chain Security solutions and secure your delivery pipeline end-to-end.
No. SCA provides visibility into open-source vulnerabilities but does not protect CI/CD pipelines, artifact integrity, container ecosystems, or AI components.
Pipelines hold credentials, signing keys, and deployment paths. A compromised runner can inject malicious artifacts or exfiltrate secrets.
Artifact governance includes registry gating, quarantine workflows, attestation verification, and policy enforcement before artifacts are promoted or deployed.
An AI-BOM would catalog AI components such as models, prompts, embeddings, and training data. Standards are still emerging.
They exploit ingestion workflows, build steps, compromised pipelines, or malicious container images — rather than known CVEs.
Build offers control but high cost and operational burden. Buy offers speed and vendor expertise but introduces trust and governance risks.


For decades, SCM has meant one thing: Source Code Management. Git commits, branches, pull requests, and version history. The plumbing of software delivery. But as AI agents show up in every phase of the software development lifecycle, from writing a spec to shipping code to reviewing a PR, the acronym is quietly undergoing its most important transformation yet.
And this isn't a rebrand. It's a rethinking of what a source repository is, what it stores, and what it serves, not just to developers, but to the agents working alongside them.
AI agents in software development are powerful but contextually blind by default. Ask a coding agent to implement a feature and it will reach out and read files, one by one, directory by directory, until it has assembled enough context to act. Ask a code review agent to assess a PR and it will crawl through the codebase to understand what changed and why it matters.
Anthropic's 2026 Agentic Coding Trends Report documents this shift in detail: the SDLC is changing dramatically as single agents evolve into coordinated multi-agent teams operating across planning, coding, review, and deployment. The report projects the AI agents market to grow from $7.84 billion in 2025 to $52.62 billion by 2030. But as agents multiply across the lifecycle, so does their hunger for codebase context, and so does the cost of getting that context wrong.
This approach has two brutal failure modes:
The result? Agents that hallucinate implementations because they missed a key abstraction three directories away. Code reviewers that flag style issues but miss architectural regressions. PRD generators that know the syntax of your codebase but not its soul.
The bottleneck is not the model. It is the absence of a pre-computed, semantically rich, always-available representation of the entire codebase: a context engine.
Consider a simple task: "Add rate limiting to the /checkout endpoint."
Without a context engine, a coding agent opens checkout.go, reads the handler function, and writes a token-bucket rate limiter inline at the top of the handler. The code compiles. The tests pass. The PR looks clean.
The agent missed three things:
The code works. The team that maintains it finds it wrong in every way that matters. A senior engineer catches these issues in review, requests changes, and the cycle restarts. Multiply this by every agent-generated PR across every team, every day.
With a context engine, the same agent queries before writing code: "How is rate limiting implemented in this service?" The context engine returns:
The agent writes a new rate limiter that follows the established pattern, implements the shared interface, emits metrics through the standard pipeline, and includes tests that match the existing style. The PR wins approval on the first pass.
The difference is context quality, not model quality.
The Language Server Protocol (LSP) transformed developer tooling in the past decade. By standardizing the interface between editors and language-aware backends, LSP gave every IDE, from VS Code to Neovim, access to autocomplete, go-to-definition, hover documentation, and real-time diagnostics. LSP was designed to serve a specific consumer: a human developer, working interactively, in a single file at a time. That design made the right trade-offs for its era:
For interactive development, these are strengths. LSP excels at what it was built to do.
Agents are a different class of consumer. They don't sit in a file waiting for cursor events. They operate across entire repositories, across SDLC phases, often in parallel. They need the full semantic picture before they start, not incrementally as they navigate.
Agents need not a replacement for LSP, but a complement: something pre-built, always available, queryable at repo scale, and semantically complete, ready before anyone opens a file.
Lossless Semantic Trees (LST), pioneered by the OpenRewrite project (born at Netflix, commercialized by Moderne), take a different approach to code representation.
Unlike the traditional Abstract Syntax Tree (AST), an LST:
This is the first layer of a Source Context Management system. Not raw files. Not a running language server. A pre-indexed semantic tree of the entire codebase, queryable by agents at any time.
A proper Source Context Management system is not a single component. It is a three-layer stack that turns a repository from a file store into something agents can actually reason over.
Every file in the repository is parsed into an LST and simultaneously embedded into a vector representation. This creates two complementary indices:
The LST and semantic indices are projected into a code knowledge graph, a property graph where nodes are functions, classes, modules, interfaces, and comments, and edges are relationships: calls, imports, inherits, implements, modifies, tests.
This graph enables queries like:
The context engine exposes itself through a Model Context Protocol (MCP) server or REST API, so any agent (whether a coding agent, a review agent, a risk assessment agent, or a documentation agent) can query the context engine directly, retrieving precisely the subgraph or semantic chunk it needs, without ever touching the raw file system.
The key insight: agents never read files. They query the context engine.
A single context engine can serve every phase of the software development lifecycle.
A PRD agent queries the context engine to understand existing capabilities, technical constraints, and module boundaries before generating a requirements document. It produces specs grounded in what the system actually is, not what someone thinks it is.
A spec agent traverses the code graph to identify affected components, surface similar prior implementations, flag integration points, and propose an architecture, all without reading a single file directly.
A coding agent retrieves the precise subgraph surrounding the feature area: the types it needs to implement, the interfaces it must satisfy, the patterns used in adjacent modules, the test conventions for this package. It writes code that fits the codebase, not just code that compiles.
A review agent queries the context engine to understand the semantic diff, not just what lines changed, but what that change means for the rest of the system. It can immediately surface:
A risk agent scores every PR against the code graph, identifying high-centrality nodes (code that many things depend on), historically buggy modules, and changes that cross team ownership boundaries. No DORA metrics spreadsheet required.
A documentation agent can traverse the code graph to generate living documentation (architecture diagrams, module dependency maps, API contracts) that updates automatically as the codebase evolves. Design principles can be encoded as graph constraints and validated on every merge.
When a production incident occurs, an on-call agent queries the context engine with the failing component and gets an immediate blast-radius map, the last 10 changes to that subgraph, the owners, and the test coverage status. Time-to-understanding drops from hours to seconds.
The business case is simple:
This is not a theoretical architecture. Tools exist today:
The missing piece is not any individual component. It is the platform that assembles them into a unified, repo-attached context engine that every agent in the SDLC can query through a single interface.
Source Context Management faces real engineering challenges:
This is the shift:
A repository is not a collection of files. A repository is a knowledge graph with a version history attached.
Git's job is to version that knowledge. The context engine's job is to make it queryable. The agent's job is to act on it.
Follow this model and the consequences are concrete. Every CI/CD pipeline should include a context engine update step, as natural as running tests. Every developer platform should expose a context engine API alongside its code hosting API. Every AI coding tool should be evaluated not just on model quality but on context engine quality.
Source code repositories that don't invest in their context layer will produce agents that are fast but wrong. Repositories with rich, well-maintained context engines will produce agents that feel like senior engineers, because they have the same depth of understanding of the codebase that a senior engineer carries in their head.
The LSP gave us IDE intelligence. Git gave us version control. Docker gave us portable environments. Kubernetes gave us cluster orchestration. Each of these was an infrastructure primitive that unlocked a new generation of developer tooling.
It is the prerequisite for every agentic SDLC capability worth building. And like every infrastructure primitive before it, the teams and platforms that build it first will be hard to catch.
SCM is no longer just about managing source code. It's about managing the context that makes the source code understandable.


Did ecTerraform vendor lock-in just become your biggest operational risk without you noticing? When HashiCorp changed Terraform's license from MPL to BSL in August 2023, legal terms were not the only alteration. They fundamentally shifted the operational landscape for thousands of platform teams who built their infrastructure automation around what they believed was an open, community-driven tool. If your organization runs Terraform at scale, you're now facing a strategic decision that wasn't on your roadmap six months ago.
The uncomfortable truth is that most teams didn't architect for IaC portability. Why would they? Terraform was open source. It was the standard. And now, many organizations find themselves in a position they swore they'd never be in again after the Kubernetes wars: locked into a single vendor's roadmap, pricing model, and strategic priorities.
This isn't theoretical; it’s the very serious reality platform engineers are dealing with right now!
Terraform lock-in wasn't always a concern. For years, Terraform represented the opposite of vendor lock-in. It was open source, cloud-agnostic, and community-driven. Teams built entire operational models around it. They trained engineers, standardized on HCL, built module libraries, and integrated Terraform deeply into CI/CD pipelines. You’ve got to hand it to them; these aspects were very desirable.
Then HashiCorp moved to the Business Source License. Suddenly, the "open" in "open source" came with conditions. The BSL restricts certain commercial uses, and while many organizations technically fall outside those restrictions, the change introduced uncertainty.
The deeper problem is architectural. Most teams didn't design for IaC engine portability because they didn't need to. Terraform state files, provider interfaces, and workflow patterns became embedded assumptions. Module libraries assumed Terraform syntax. Pipelines called `terraform plan` and `terraform apply` directly. When every workflow is tightly coupled to a single tool's CLI and API, switching becomes expensive.
This is classic vendor lock-in, even if it happened gradually and without malice.
The immediate cost of Terraform lock-in isn't the license itself, but rather related to what you can't do when you're locked in.
If HashiCorp decides to sunset features, deprecate APIs, or introduce breaking changes, you either adapt or do without; stuck on an outdated version with mounting technical debt.
The operational risk compounds over time. When you're locked into a single IaC tool, you're also locked into its limitations. If drift detection isn't native, you build workarounds. If policy enforcement is bolted on, you maintain custom integrations. If the state backend causes performance issues at scale, you optimize around the bottleneck rather than solving the root problem.
And then there's the talent risk. If your team only knows Terraform, and the industry shifts toward other IaC paradigms, you're either retraining everyone or competing for a shrinking talent pool. Monocultures are fragile.
The good news is that escaping Terraform lock-in doesn't require a full rewrite. It requires a deliberate strategy to introduce portability into your IaC architecture.
OpenTofu emerged as the open-source fork of Terraform immediately after the license change. It's MPL-licensed, community-governed through the Linux Foundation, and API-compatible with Terraform 1.5.x. For most teams, OpenTofu migration is the lowest-friction path to regaining control over your IaC engine.
Migrating to OpenTofu doesn't mean abandoning your existing Terraform workflows. Because OpenTofu maintains compatibility with Terraform's core primitives, you can run OpenTofu side-by-side with Terraform during a transition. This lets you validate behavior, test edge cases, and build confidence before committing fully.
The strategic advantage of OpenTofu is not just licensing, optionality. Once you're no longer tied to HashiCorp's roadmap, you can evaluate IaC engines based on technical merit rather than sunk cost.
The harder part of escaping IaC vendor lock-in is decoupling your operational workflows from Terraform-specific patterns. This means abstracting your pipelines so they don't hardcode `terraform plan` and `terraform apply`. It means designing module interfaces that could theoretically support multiple engines. It means treating the IaC engine as an implementation detail rather than the foundation of your architecture.
This is where infrastructure as code portability becomes a design principle. If your pipelines call a generic "plan" and "apply" interface, switching engines becomes a simple configuration change, not a migration project.
The reality is that most large organizations will eventually run multiple IaC tools. Some teams will use OpenTofu. Others will stick with Terraform for compatibility with existing state. New projects might adopt Terragrunt for DRY configurations or Pulumi for type-safe infrastructure definitions.
Fighting this diversity creates friction. Embracing it requires tooling that supports multi-IaC environments without forcing everyone into a lowest-common-denominator workflow. You need a platform that treats OpenTofu, Terraform, and other engines as first-class citizens, not as competing standards.
Harness Infrastructure as Code Management was built to solve the multi-IaC problem that most teams are only now realizing they have. It doesn't force you to pick a single engine. It doesn't assume Terraform is the default. It treats OpenTofu and Terraform as equally supported engines, with workflows that abstract away engine-specific details while preserving the flexibility to use either.
This matters because escaping Terraform lock-in isn't just about switching tools. It's about building infrastructure automation that doesn't collapse the next time a vendor changes direction.
Harness IaCM supports OpenTofu and Terraform natively, which means you can run both engines in the same platform without maintaining separate toolchains. You get unified drift detection, policy enforcement, and workspace management across engines. If you're migrating from Terraform to OpenTofu, you can run both during the transition and compare results side-by-side.
The platform also supports Terragrunt, which means teams that have invested in DRY Terraform configurations don't have to throw away that work to gain vendor neutrality. You can keep your existing module structure while gaining the operational benefits of a managed IaC platform.
Beyond engine support, Harness IaCM addresses the systemic problems that make IaC vendor lock-in so painful. The built-in Module and Provider Registry means you're not dependent on third-party registries that could introduce their own lock-in. Variable Sets and Workspace Templates let you enforce consistency without hardcoding engine-specific logic into every pipeline. Default plan and apply pipelines abstract away the CLI layer, so switching engines doesn't require rewriting every workflow.
Drift detection runs continuously, which means you catch configuration drift before it becomes an incident. Policy enforcement happens at plan time, which means violations are blocked before they reach production. These aren't afterthoughts or plugins. They're native platform capabilities that work the same way regardless of which IaC engine you're using.
And because Harness IaCM is part of the broader Harness Platform, you can integrate IaC workflows with CI/CD, feature flags, and policy governance without duct-taping together disparate tools. This is the architectural model that makes multi-IaC tool management practical at scale.
Explore the Harness IaCM product or dive into the technical details in the IaCM docs.
Escaping Terraform lock-in is not about abandoning Terraform everywhere tomorrow. It's about regaining strategic control over your infrastructure automation. It's about designing for portability so that future licensing changes, roadmap shifts, or technical limitations don't force another painful migration.
The teams that will navigate this transition successfully are the ones that treat IaC engines as interchangeable components in a larger platform architecture. They're the ones that build workflows that abstract away engine-specific details. They're the ones that invest in tooling that supports multi-IaC environments without creating operational chaos.
If your organization is still locked into Terraform, now is the time to architect for optionality. Start by evaluating OpenTofu migration paths. Decouple your pipelines from engine-specific CLI calls. Adopt a platform that treats IaC engines as implementation details, not strategic dependencies.
Because the next time a vendor changes their license, you want to be in a position to evaluate your options, not scramble for a migration plan.


AI made writing code faster. It didn’t make releasing that code safer.
That’s the tension platform teams are dealing with right now. Development velocity is rising, but release operations still depend on too many manual decisions, too many disconnected tools, and too much tribal knowledge. Teams can deploy more often, but they still struggle to standardize how features are exposed, how approvals are handled, how risky changes are governed, and how old flags get cleaned up before they turn into debt.
That’s where the latest Harness FME integrations matter.
Harness Feature Management & Experimentation is no longer just a place to create flags and run tests. With recent pipeline integration and policy support, FME becomes part of a governed release system. That’s the bigger story.
Feature flags are valuable. But at scale, value comes from operationalizing them.
The software delivery gap is getting easier to see.
In a recent Harness webinar, Lena Sano, a software developer on the Harness DevRel team and I framed the problem clearly: AI accelerates code creation, but the release system behind it often still looks manual, inconsistent, and fragile.
That perspective matters because both Lena and I sit close to the problem from different angles. I brought the platform and operating-model view. Lena showed what it actually looks like when feature release becomes pipeline-driven instead of person-driven.
The tension they described is familiar to most platform teams. When more code gets produced, more change reaches production readiness. That doesn’t automatically translate into safer releases. In fact, it usually exposes the opposite. Teams start batching more into each launch, rollout practices diverge from service to service, and approvals become a coordination tax instead of a control mechanism.
That’s why release discipline matters more in the AI era, not less.
Feature flags solve an important problem: they decouple deployment from release.
That alone is a major improvement. Teams can deploy code once, expose functionality gradually, target cohorts, run experiments, and disable a feature without redeploying the whole application.
But a flag by itself is not a release process.
I made the point directly in the webinar: feature flags are “the logical end of the pipeline process.” That line gets to the heart of the issue. When flags live outside the delivery workflow, teams get flexibility but not consistency. They can turn things on and off, but they still don’t have a standardized path for approvals, staged rollout, rollback decisions, or cleanup.
That’s where many programs stall. They adopt feature flags, but not feature operations.
The result is predictable:
This is why platform teams need more than flagging. They need a repeatable system around feature release.
The recent Harness FME pipeline integration addresses exactly that gap.
In the webinar demo, Lena showed a feature release workflow where the pipeline managed status updates, targeting changes, approvals, rollout progression, experiment review, and final cleanup. I later emphasized that “95% of it was run by a single pipeline.”
That’s not just a useful demo line. It’s the operating model platform teams have been asking for.
The first value of pipeline integration is simple: teams get a common release language.
Instead of every service or squad improvising its own process, pipelines can define explicit rollout stages and expected transitions. A feature can move from beta to ramping to fully released in a consistent, visible way.
That sounds small, but it isn’t. Standardized states create transparency, reduce confusion during rollout, and make it easier for multiple teams to understand where a change actually is.
Approvals are often where release velocity goes to die.
Without pipelines, approvals happen per edit or through side channels. A release manager, product owner, or account team gets pulled in repeatedly, and the organization calls that governance.
It isn’t. It’s coordination overhead.
Harness pipelines make approvals part of the workflow itself. That means platform teams can consolidate approval logic, trigger it only when needed, and capture the decision in the same system that manages the rollout.
That matters operationally and organizationally. It reduces noise for approvers, creates auditability, and keeps release evidence close to the actual change.
One of the most useful ideas in the webinar was that rollback should depend on what actually failed.
If the problem is isolated to a feature treatment, flip the flag. If the issue lives in the deployment itself, use the pipeline rollback or redeploy path. That flexibility matters because forcing every incident through a full application rollback is both slower and more disruptive than it needs to be.
With FME integrated into pipelines, teams don’t have to choose one blunt response for every problem. They can respond with the right mechanism for the failure mode.
That’s how release systems get safer.
Most organizations talk about flag debt after they’ve already created it.
The demo tackled that problem directly by making cleanup part of the release workflow. Once the winning variant was chosen and the feature was fully released, the pipeline paused for confirmation that the flag reference had been removed from code. Then targeting was disabled and the release path was completed.
That is a much stronger model than relying on someone to remember cleanup later.
Feature flags create leverage when they’re temporary control points. They create drag when they become permanent artifacts.
Pipelines standardize motion. Policies standardize behavior.
That’s why the recent FME policy integration matters just as much as pipeline integration.
As organizations move from dozens of flags to hundreds or thousands, governance breaks down fast. Teams start hitting familiar failure modes: flags without owners, inconsistent naming, unsafe default treatments, production targeting mistakes, segments that expose sensitive information, and change requests that depend on people remembering the rules.
Policy support changes that.
Harness now brings Policy as Code into feature management so teams can enforce standards automatically instead of managing them with review boards and exceptions.
This is the core release management tradeoff most organizations get wrong.
They think the only way to increase safety is to add human checkpoints everywhere. That works for a while. Then scale arrives, and those checkpoints become the bottleneck.
Harness takes a better approach. Platform teams can define policies once using OPA and Rego, then have Harness automatically evaluate changes against those policy sets in real time.
That means developers get fast feedback without waiting for a meeting, and central teams still get enforceable guardrails.
That is what scalable governance looks like.
The strongest part of the policy launch is that it doesn’t stop at the flag object itself.
It covers the areas where release risk actually shows up:
That matters because most rollout failures aren’t caused by the existence of a flag. They’re caused by how that flag is configured, targeted, or changed.
Governance only works when it matches how organizations are structured.
Harness policy integration supports that with scope and inheritance across the account, organization, and project levels. Platform teams can set non-negotiable global guardrails where they need them, while still allowing business units or application teams to define more specific policies in the places that require flexibility.
That is how you avoid the two classic extremes: the wild west and the central committee.
Global standards stay global. Team-level nuance stays possible.
The most important point here is not that Harness added two more capabilities.
It’s that these capabilities strengthen the same release system.
Pipelines standardize the path from deployment to rollout. FME controls release exposure, experimentation, and feature-level rollback. Policy as Code adds guardrails to how teams create and change those release controls. Put together, they form a more complete operating layer for software change.
That is the Harness platform value.
A point tool can help with feature flags. Another tool can manage pipelines. A separate policy engine can enforce standards. But when those pieces are disconnected, the organization has to do the integration work itself. Process drift creeps in between systems, and teams spend more time coordinating tools than governing change.
Harness moves that coordination into the platform.
This is the same platform logic that shows up across continuous delivery and GitOps, Feature Management & Experimentation, and modern progressive delivery strategies. The more release decisions can happen in one governed system, the less organizations have to rely on handoffs, tickets, and tribal knowledge.
The webinar and the new integrations point to a clearer operating model for modern release management.
Use CD to ship the application safely. Then use FME to expose the feature by cohort, percentage, region, or treatment.
Standardize stages, approvals, status transitions, and evidence collection so every release doesn’t invent its own operating model.
Move governance into Policy as Code. Don’t ask people to remember naming standards, metadata requirements, targeting limits, or approval conditions.
Use the flag, the pipeline, or a redeploy path based on the actual failure mode. Don’t force every issue into one response pattern.
Treat cleanup as a first-class release step, not a future best intention.
This is the shift platform engineering leaders should care about. The goal isn’t to add feature flags to the stack. It’s to build a governed release system that can absorb AI-era change volume without depending on heroics.
If this model is working, the signal should show up in operational metrics.
Start with these:
These are the indicators that tell you whether release governance is scaling or just getting noisier.
AI made software creation faster, but it also exposed how weak most release systems still are.
Feature flags help. Pipelines help. Policy as code helps. But the real value shows up when those capabilities work together as one governed release model.
That’s what Harness FME now makes possible. Teams can standardize rollout paths, automate approvals where they belong, enforce policy without slowing delivery, and clean up flags before they become operational debt. That is what it means to release fearlessly on a platform, not just with a point tool.
Ready to see how Harness helps platform teams standardize feature releases with built-in governance? Contact Harness for a demo.
Pipelines automate deployment and standardize release workflows. Feature flags decouple deployment from feature exposure, which gives teams granular control over rollout, experimentation, and rollback. Together, they create a safer and more repeatable release system.
It brings feature release actions into the same workflow that manages delivery. Teams can standardize status changes, targeting, approvals, rollout progression, and cleanup instead of handling those steps manually or in separate tools.
At scale, manual governance breaks down. Policy as code lets platform teams enforce standards automatically on flags, targeting rules, segments, and change requests so safety doesn’t depend on people remembering the rules.
Teams can enforce naming conventions, ownership and tagging requirements, safer targeting defaults, environment-specific rollout rules, segment governance, and approval requirements for sensitive change requests.
It reduces risk by combining progressive rollout controls with standardized workflows and automated governance. Teams can limit blast radius, catch unsafe changes earlier, and respond with the right rollback path when issues appear.
It shows how Harness connects delivery automation, feature release control, and governance in one system. That reduces toolchain sprawl and turns release management into a platform capability rather than a collection of manual steps.
They make cleanup part of the workflow. When the rollout is complete and the winning treatment is chosen, the pipeline should require confirmation that the flag has been removed from code and no longer needs active targeting.


Releasing fearlessly isn't just about getting code into production safely. It's about knowing what happened after the release, trusting the answer, and acting on it without stitching together three more tools.
That is where many teams still break down.
They can deploy. They can gate features. They can even run experiments. But the moment they need trustworthy results, the workflow fragments. Event data moves into another system. Metric definitions drift from business logic. Product, engineering, and data teams start debating the numbers instead of deciding what to do next.
That's why Warehouse Native Experimentation matters.
Today, Harness is making Warehouse Native Experimentation generally available in Feature Management & Experimentation (FME). After proving the model in beta, this capability is now ready for broader production use by teams that want to run experiments directly where their data already lives.
This is an important launch on its own. It is also an important part of the broader Harness platform story.
Because “release fearlessly” is incomplete if experimentation still depends on exported datasets, shadow pipelines, and black-box analysis.
The AI era changed one thing fast: the volume of change.
Teams can create, modify, and ship software faster than ever. What didn't automatically improve was the system that turns change into controlled outcomes. Release coordination, verification, experimentation, and decision-making are still too often fragmented across different tools and teams.
That's the delivery gap.
In a recent Harness webinar, Lena Sano, a Software Developer on the Harness DevRel team and I showed why this matters. Their point was straightforward: deployment alone is not enough. As I said in the webinar, feature flags are “the logical end of the pipeline process.”
That framing matters because it moves experimentation out of the “nice to have later” category and into the release system itself.
When teams deploy code with Harness Continuous Delivery, expose functionality with Harness FME, and now analyze experiment outcomes with trusted warehouse data, the release moment becomes a closed loop. You don't just ship. You learn.
Warehouse Native Experimentation extends Harness FME with a model that keeps experiment analysis inside the data warehouse instead of forcing teams to export data into a separate analytics stack.
That matters for three reasons.
First, it keeps teams closer to the source of truth the business already trusts.
Second, it reduces operational drag. Teams do not need to build and maintain unnecessary movement of assignment and event data just to answer basic product questions.
Third, it makes experimentation more credible across functions. Product teams, engineers, and data stakeholders can work from the same governed data foundation instead of arguing over two competing systems.
General availability makes this model ready to support production experimentation programs that need more than speed. They need trust, repeatability, and platform-level consistency.
Traditional experimentation workflows assume that analysis can happen somewhere downstream from release. That assumption does not hold up well anymore.
When development velocity rises, so does the volume of features to evaluate. Teams need faster feedback loops, but they also need stronger confidence in the data behind the decision. If every experiment requires moving data into another system, recreating business metrics, and validating opaque calculations, the bottleneck just shifts from deployment to analysis.
That's the wrong pattern for platform teams.
Platform teams are being asked to support higher release frequency without increasing risk. They need standardized workflows, strong governance, and fewer manual handoffs. They do not need another disconnected toolchain where experimentation introduces more uncertainty than it removes.
Warehouse Native Experimentation addresses that by bringing experimentation closer to the release process and closer to trusted business data at the same time.
This launch matters because it changes how experimentation fits into the software delivery model.


Warehouse Native Experimentation lets teams run analyses directly in supported data warehouses rather than exporting experiment data into an external system first.
That is a meaningful shift.
It means your experiment logic can operate where your product events, business events, and governed data models already exist. Instead of copying data out and hoping definitions stay aligned, teams can work from the warehouse as the source of truth.
For organizations already invested in platforms like Snowflake or Amazon Redshift, this reduces friction and increases confidence. It also helps avoid the shadow-data problem that shows up when experimentation becomes one more separate analytics island.
Good experimentation depends on metric quality.
Warehouse Native Experimentation lets teams define metrics from the warehouse tables they already trust. That includes product success metrics as well as guardrail metrics that help teams catch regressions before they become larger incidents.
This is a bigger capability than it may appear.
Many experimentation programs fail not because teams lack ideas, but because they cannot agree on what success actually means. When conversion, latency, revenue, or engagement are defined differently across tools, the experiment result becomes negotiable.
Harness moves that discussion in the right direction. The metric should reflect the business reality, not the reporting limitations of a separate experimentation engine.
Speed matters. Trust matters more.
Warehouse Native Experimentation helps teams understand impact with results that are transparent and inspectable. That gives engineering, product, and data teams a better basis for action.
The practical benefit is simple: when a result looks surprising, teams can validate the logic instead of debating whether the tool is doing something hidden behind the scenes.
That transparency is a major part of the launch story. Release fear decreases when teams trust both the rollout controls and the data used to judge success.
Warehouse Native Experimentation is valuable on its own. But its full value shows up when you look at how it fits into the Harness platform.
In the webinar, Lena demonstrated a workflow where a pipeline controlled flag status, targeting, approvals, rollout progression, and even cleanup. I emphasized that “95% of it was run by a single pipeline.”
That is not just a demo detail. It is the operating model platform teams want.
Pipelines make releases consistent. They reduce team-to-team variation. They create auditability. They turn release behavior into a reusable system instead of a series of manual decisions.
Harness FME gives teams the ability to decouple deployment from release, expose features gradually, target specific cohorts, and run experiments as part of a safer delivery motion.
That is already powerful.
It lets teams avoid full application rollback when one feature underperforms. It lets them isolate problems faster. It gives product teams a structured way to learn from real usage without treating every feature launch like an all-or-nothing event.
Warehouse Native Experimentation completes that model.
Now the experiment does not end at exposure control. It continues into governed analysis using the data infrastructure the business already depends on. The result is a tighter loop from release to measurement to decision.
That is why this is a platform launch.
Harness is not asking teams to choose between delivery tooling and experimentation tooling and warehouse trust. The platform brings those motions together:
That is what “release fearlessly” looks like when it extends beyond deployment.
Engineering leaders should think about this launch as a better operating model for software change.
Release with control. Use pipelines and feature flags to separate deployment from feature exposure.
Verify with the right signals. Use guardrail metrics and rollout logic to contain risk before it spreads.
Learn from trusted data. Run experiments against the warehouse instead of recreating the truth somewhere else.
Standardize the process. Make approvals, measurement, and cleanup part of the same repeatable workflow.
This is especially important for platform teams trying to keep pace with AI-assisted development. More code generation only helps the business if the release system can safely absorb more change and turn it into measurable outcomes.
Warehouse Native Experimentation helps make that possible.
This feature will be especially relevant for teams that:
As software teams push more change through the system, trusted experimentation can no longer sit off to the side. It has to be part of the release model itself.
Harness now gives teams a stronger path to do exactly that: deploy safely, release progressively, and measure impact where trusted data already lives. That is not just better experimentation. It is a better software delivery system.
Ready to see how Harness helps teams release fearlessly with trusted, warehouse-native experimentation? Contact Harness for a demo.
Warehouse Native Experimentation is a capability in Harness FME that lets teams analyze experiment outcomes directly in their data warehouse. That keeps experimentation closer to governed business data and reduces the need to export data into separate analysis systems.
GA signals that the capability is ready for broader production adoption. For platform and product teams, that means Warehouse Native Experimentation can become part of a standardized release and experimentation workflow rather than a limited beta program.
Traditional approaches often require moving event data into a separate system for analysis. Warehouse-native experimentation keeps analysis where the data already lives, which improves trust, reduces operational overhead, and helps align experiment metrics with business definitions.
Safer releases are not only about deployment controls. They also require trusted feedback after release. Warehouse Native Experimentation helps teams learn from production changes using governed warehouse data, making release decisions more confident and more repeatable.
Harness pipelines help standardize the release workflow, while Harness FME controls rollout and experimentation. Warehouse Native Experimentation adds trusted measurement to that same motion, closing the loop from deployment to exposure to decision.
Organizations with mature data warehouses, strong governance requirements, and a need to scale experimentation across teams will benefit most. It is especially relevant for platform teams that want experimentation to be part of a consistent software delivery model.


A financial services company ships code to production 47 times per day across 200+ microservices. Their secret isn't running fewer tests; it's running the right tests at the right time.
Modern regression testing must evolve beyond brittle test suites that break with every change. It requires intelligent test selection, process parallelization, flaky test detection, and governance that scales with your services.
Harness Continuous Integration brings these capabilities together: using machine learning to detect deployment anomalies and automatically roll back failures before they impact customers. This framework covers definitions, automation patterns, and scale strategies that turn regression testing into an operational advantage. Ready to deliver faster without fear?
Managing updates across hundreds of services makes regression testing a daily reality, not just a testing concept. Regression testing in CI/CD ensures that new code changes don’t break existing functionality as teams ship faster and more frequently. In modern microservices environments, intelligent regression testing is the difference between confident daily releases and constant production risk.
These terms often get used interchangeably, but they serve different purposes in your pipeline. Understanding the distinction helps you avoid both redundant test runs and dangerous coverage gaps.
In practice, you run them sequentially: retest the fix first, then run regression suites scoped to the affected services. For microservices environments with hundreds of interdependent services, this sequencing prevents cascade failures without creating deployment bottlenecks.
The challenge is deciding which regression tests to run. A small change to one service might affect three downstream dependencies, or even thirty. This is where governance rules help. You can set policies that automatically trigger retests on pull requests and broader regression suites at pre-production gates, scoping coverage based on change impact analysis rather than gut feel.
To summarize, Regression testing checks that existing functionality still works after a change. Retesting verifies that a specific bug fix works as intended. Both are essential, but they serve different purposes in CI/CD pipelines.
The regression testing process works best when it matches your delivery cadence and risk tolerance. Smart timing prevents bottlenecks while catching regressions before they reach users.
This layered approach balances speed with safety. Developers get immediate feedback while production deployments include comprehensive verification. Next, we'll explore why this structured approach becomes even more critical in microservices environments where a single change can cascade across dozens of services.
Modern enterprises managing hundreds of microservices face three critical challenges: changes that cascade across dependent systems, regulatory requirements demanding complete audit trails, and operational pressure to maintain uptime while accelerating delivery.
A single API change can break dozens of downstream services you didn't know depended on it.
Financial services, healthcare, and government sectors require documented proof that tests were executed and passed for every promotion.
Catching regressions before deployment saves exponentially more than fixing them during peak traffic.
With the stakes clear, the next question is which techniques to apply.
Once you've established where regression testing fits in your pipeline, the next question is which techniques to apply. Modern CI/CD demands regression testing that balances thoroughness with velocity. The most effective techniques fall into three categories: selective execution, integration safety, and production validation.
Once you've established where regression testing fits in your pipeline, the next question is which techniques to apply. Modern CI/CD demands regression testing that balances thoroughness with velocity. The most effective techniques fall into three categories: selective execution, integration safety, and production validation—with a few pragmatic variants you’ll use day-to-day.
These approaches work because they target specific failure modes. Smart selection outperforms broad coverage when you need both reliability and rapid feedback.
Managing regression testing across 200+ microservices doesn't require days of bespoke pipeline creation. Harness Continuous Integration provides the building blocks to transform testing from a coordination nightmare into an intelligent safety net that scales with your architecture.
Step 1: Generate pipelines with context-aware AI. Start by letting Harness AI build your pipelines based on industry best practices and the standards within your organization. The approach is interactive, and you can refine the pipelines with Harness as your guide. Ensure that the standard scanners are run.
Step 2: Codify golden paths with reusable templates. Create Harness pipeline templates that define when and how regression tests execute across your service ecosystem. These become standardized workflows embedding testing best practices while giving developers guided autonomy. When security policies change, update a single template and watch it propagate to all pipelines automatically.
Step 3: Enforce governance with Policy as Code. Use OPA policies in Harness to enforce minimum coverage thresholds and required approvals before production promotions. This ensures every service meets your regression standards without manual oversight.
With automation in place, the next step is avoiding the pitfalls that derail even well-designed pipelines.
Regression testing breaks down when flaky tests erode trust and slow suites block every pull request. These best practices focus on governance, speed optimization, and data stability.
Regression testing in CI/CD enables fast, confident delivery when it’s selective, automated, and governed by policy. Regression testing transforms from a release bottleneck into an automated protection layer when you apply the right strategies. Selective test prioritization, automated regression gates, and policy-backed governance create confidence without sacrificing speed.
The future belongs to organizations that make regression testing intelligent and seamless. When regression testing becomes part of your deployment workflow rather than an afterthought, shipping daily across hundreds of services becomes the norm.
Ready to see how context-aware AI, OPA policies, and automated test intelligence can accelerate your releases while maintaining enterprise governance? Explore Harness Continuous Integration and discover how leading teams turn regression testing into their competitive advantage.
These practical answers address timing, strategy, and operational decisions platform engineers encounter when implementing regression testing at scale.
Run targeted regression subsets on every pull request for fast feedback. Execute broader suites on the main branch merges with parallelization. Schedule comprehensive regression testing before production deployments, then use core end-to-end tests as synthetic testing during canary rollouts to catch issues under live traffic.
Retesting validates a specific bug fix — did the payment timeout issue get resolved? Regression testing ensures that the fix doesn’t break related functionality like order processing or inventory updates. Run retests first, then targeted regression suites scoped to affected services.
There's no universal number. Coverage requirements depend on risk tolerance, service criticality, and regulatory context. Focus on covering critical user paths and high-risk integration points rather than chasing percentage targets. Use policy-as-code to enforce minimum thresholds where compliance requires it, and supplement test coverage with AI-powered deployment verification to catch regressions that test suites miss.
No. Full regression on every commit creates bottlenecks. Use change-based test selection to run only tests affected by code modifications. Reserve comprehensive suites for nightly runs or pre-release gates. This approach maintains confidence while preserving velocity across your enterprise delivery pipelines.
Quarantine flaky tests immediately, rather than letting them block pipelines. Tag unstable tests, move them to separate jobs, and set clear SLAs for fixes. Use failure strategies like retry logic and conditional execution to handle intermittent issues while maintaining deployment flow.
Treat test code with the same rigor as application code. That means version control, code reviews, and regular cleanup of obsolete tests. Use policy-as-code to enforce coverage thresholds across teams, and leverage pipeline templates to standardize how regression suites execute across your service portfolio.


When an offensive security AI agent can compromise one of the world’s most sophisticated consulting firms in under two hours with no credentials, guidance, or insider knowledge, it’s not just a breach but a warning sign to industry.
That’s exactly what happened when an AI agent targeted McKinsey’s Generative AI platform, Lilli. The agent chained together application flaws, API misconfigurations, and AI-layer vulnerabilities into a machine-speed attack. This wasn’t a novel zero-day exploit. It was the exploitation of familiar application security gaps and newer AI attack vectors, amplified by AI speed, autonomy, and orchestration.
Enterprises are already connecting functionality and troves of data through APIs. Increasingly, they’re wiring up applications with Generative AI and agentic workflows to accelerate their businesses. The risk of intellectual property loss and sensitive data exposure is amplified exponentially. Organizational teams must rethink their AI security strategy and likely also revisit API security in parallel.
Let’s be precise about what happened, avoiding the blame for McKinsey moving at a pace that much of the industry is already adopting with application and AI technology.
The offensive AI agent probing McKinsey’s AI system was quickly able to:
From there, the AI agent accessed:
Even experienced penetration testers don’t move this fast, not without AI tools to augment their testing. Many would stumble to find the type of SQL injection flaw present, let alone all the other elements in the attack chain.
What makes this security incident different and intriguing is how the AI agent crossed layers of the technology stack that are now prominent in AI-native designs.
McKinsey’s search API was vulnerable to blind SQL injection. The AI agent discovered that while values were parameterized (a security best practice), it could still inject into JSON keys used as field names in the backend database and analyze the resulting error messages. Through continued probing and evaluation of these error messages, the agent mapped the query structure and extracted production data.
These are long-known weaknesses in how applications are secured. Many organizations rely on web application firewall (WAF) instances to filter and monitor web application traffic and to stop attacks such as SQL injection. However, attack methods constantly evolve. Blind SQL injection, where attackers infer information from the system without seeing direct results, is harder to detect and works by analyzing system responses to invalid queries, such as those that delay server response. These attacks can also be made to look like normal data traffic.
Security teams need monitoring capabilities that analyze application traffic over time to identify anomalous behaviors and the signals of an attack.
The offensive agent quickly performed reconnaissance of McKinsey’s system to understand its API footprint and discovered that 22 API endpoints were unauthenticated, one of which served as the initial and core point of compromise.
The public API documentation served as a roadmap for the AI agent, detailing the system's structure and functionality. This presents a tricky proposition, since well-documented APIs and API schema definitions are critical to increasing adoption of productized APIs, enabling AIs to find your services, and facilitating agent orchestration.
APIs aren’t just data pipes anymore; they’re also control planes for AI systems.
APIs serve as control planes in AI-native designs, managing the configuration of model commands and access controls, and also connecting the various AI and data services. Compromising this layer enables attackers to manipulate AI configuration, control AI behavior, and exfiltrate data.
The major oversight here was the presence of 22 unauthenticated API endpoints that allowed unfettered access. This is a critical API security vulnerability, known as broken authentication.
Lack of proper authorization enabled the AI agent to manipulate unique identifiers assigned to data objects within the API calls, increase its own access permissions (escalate privileges), and retrieve other users' data. The weakness is commonly known as broken object-level authorization (BOLA), where system checks fail to restrict user or machine access to specific data. McKinsey’s AI design also allowed direct API access to backend systems, potentially exposing internal technical resources and violating zero-trust architecture (ZTA) principles. With ZTA, you must presume that the given identity and the environment are compromised, operate with least privilege, and ensure controls are in place to limit blast radius in the event of an attack. At a minimum, all identities must be continuously authenticated and authorized before accessing resources.
A breach in an AI system essentially provides centralized access to all organizational knowledge. A successful intrusion can grant control over system logic via features such as writable system prompts. This enables attackers to rewrite AI guardrails, subtly steering AI to bypass compliance policies, generate malicious code, or leak sensitive information.
New risks arise when organizations aim to improve AI system usefulness by grounding them with other sources (e.g., web searches, databases, documents, files) or using retrieval-augmented generation (RAG) pipelines that connect data sources to AI systems. This is done to tweak the prompts sent to LLMs and improve the quality of responses. However, attackers exploit these connections to corrupt the information processing or trick the AI into revealing sensitive or proprietary data.
With its elevated access, the AI agent had the ability to gain influence over:
A breach in the AI layer is not just a security incident, but a core attack on the integrity and competence of the business.
The rise of generative AI has further dissolved traditional security perimeters and created critical new attack vectors. Attackers can now target core mechanisms of institutional intelligence and reasoning, not just data.
Traditional "defense in depth" thinking segments application and AI protection into isolated layers, commonly WAFs, API gateways, API runtime security, and AI guardrails. While offering granular protection, such approaches inadvertently create a critical security blind spot: they fail to track sophisticated, multi-stage attacks that exploit handoffs between application layers.
Modern attacks are fluid campaigns. They may target frontend code as the initial attack vector, abuse APIs to attack business logic, bypass access controls enforced by gateways, pivot to database services for data exfiltration, and leverage access to manipulate reasoning of AI services.
The fatal flaw is the inability to maintain a single, unbroken chain of contextual awareness across the entire sequence. Each isolated WAF, gateway, or AI guardrail only sees a segment of the event and loses visibility once the request passes to the next layer. This failure to correlate events in real-time across APIs, applications, databases, and AI services is the blind spot that attackers exploit. By the time related signals are gathered and correlated in an organization’s SIEM, the breach has already occurred. True resilience requires a unified runtime platform to quickly identify, correlate, and respond to complex application attack chains.
To connect signals and stop advanced attacks, organizations need correlated visibility and control across their application, API, and AI footprint. This essential capability comes from three key elements.
A platform must identify your application assets by combining and analyzing traffic signals from:
Runtime protection must go beyond simple authentication checks. It requires a deep understanding of other application context including:
Threat detection and prevention must happen at multiple levels during runtime, which include:
The incident with McKinsey's AI system didn’t introduce new vulnerabilities. It revealed something more important.
AI systems amplify every weakness across your stack, and AI excels at finding them.
Act now by reevaluating your AI security posture, unifying security monitoring, and bridging gaps that AI can exploit before attackers do.
It’s fortunate this event was essentially a research experiment and not a motivated threat actor. Attackers are already thinking in terms of AI-native designs. It’s not about endpoints or services for them; it’s about attack chains that enable them to get to your organization’s data or intelligence.
When reviewing your application security strategy, it’s not whether you have application firewalls, API protection, or AI guardrails to mitigate attacks; it’s whether they work together effectively.
.jpg)
.jpg)
Eight years ago, we shipped Continuous Verification (CV) to solve one of the most miserable parts of a great engineer’s job: babysitting deployments.
The idea was simple but powerful. At 3:00 AM, your best engineers shouldn't be staring at dashboards waiting to see if a release went sideways. CV was designed to think like those engineers, watching your APM metrics, scanning your logs, and making the call for you. Roll forward or roll back, automatically, based on what the data actually said.
It worked. Customers loved it. Hundreds of teams stopped losing sleep over deployments.
But somewhere along the way, we noticed a new problem creeping in: setting up CV had become its own burden.
To get value from Continuous Verification, you had to know what to look for. Which metrics matter for this service? Which log patterns indicate trouble? Which thresholds separate a blip from a real incident?
When we talk to teams trying to use Argo Rollouts and set up automatic verification with its analysis templates, we hear that they hit the same challenges.
For teams with deep observability expertise, this was fine. For everyone else—and honestly, for experienced teams onboarding new services—it added friction that shouldn't exist. We’d solved the hardest part of deployments, but we’d left engineers with a new "homework assignment" just to get started.
That’s what AI Verification & Rollback is designed to fix.
AI Verification & Rollback builds directly on the CV foundation you already trust, but adds a layer of intelligence before the analysis even begins. Instead of requiring you to define your metrics and log queries upfront, the system queries your observability provider—via MCP server—at the moment of deployment to determine what actually matters for the service you just deployed.
What that means in practice:
At our user conference six months ago, we showed this running live—triggering a real deployment, watching the MCP server query Dynatrace for relevant signals, and walking through a live failure analysis that caught a bad release within minutes. The response was immediate. Engineers got it instantly, because it matched how they already think about post-deploy monitoring.
We’ve spent the past six months hardening what we showed you. A few highlights:
We're not declaring CV legacy today. AI Verification & Rollback is not yet a full replacement for traditional Continuous Verification across all use cases and customer configurations. CV remains the right choice for many teams, and we're committed to supporting it.
Bottom line: AI V&R is ready for many teams to use. It's available now, and for teams setting up verification for the first time—or looking to reduce the operational overhead of maintaining verification configs—it's the faster, smarter path forward.
The takeaway here is simple: If you've been putting off setting up Continuous Verification because of the configuration overhead, this is the version you were waiting for.
Ready to stop babysitting your releases? Drop the AI V&R step into your next pipeline and see what it finds.
How is your team currently handling the "3:00 AM dashboard stare"—and how much time would you save if the pipeline just told you why it rolled back?


AI has officially made writing code cheap.
Your developers are shipping more changes, across more microservices, more frequently than ever before. If you’re a developer, it feels like a golden age.
But for the Release Engineer? This isn't necessarily a celebration; it’s a scaling nightmare.
We’re currently seeing what I call the "AI delivery gap." It’s that uncomfortable space between the breakneck speed at which we can now generate code and the manual, spreadsheet-driven processes we still use to actually release it.
The reality is that while individual CI/CD pipelines might be automated, the coordination between them remains a stubbornly human bottleneck. We’ve automated the "how" of shipping code, but we’re still stuck in the Dark Ages when it comes to the "when" and "with whom."
Today, we are introducing Harness Release Orchestration alongside four other capabilities that ensure confident releases. Release Orchestration is designed to transform the release management process from a fragmented, manual effort into a standardized, visible, and scalable operation.

Most release engineers I talk to spend about 40% of their time "chasing humans for status." You’re checking Slack threads for sign-offs, updating Confluence pages, and obsessively watching spreadsheets to ensure Team A’s service doesn't break Team B’s dependency. (And let’s be honest, it usually does anyway.)
We could call it a team sport, but it’s really a multi-team sport. Teams from multiple services and functions need to come together to deliver a big release.
If we rely on a person to coordinate, we can’t move fast enough.
Harness Release Orchestration moves beyond the single pipeline. It introduces a process-based framework that acts as your release "blueprint."
Release management software isn’t an entirely new idea. It’s been tried before, but never widely adopted. The industry went wrong by building separate tools for continuous delivery and release orchestration.
With separate tools, you incur integration overhead, have multiple places to look, and experience awkwardness.
We’ve built ours alongside our CD experience, so everything is as seamless and fast as possible. Yes, this is for releases that are more complex than a simple microservice, which the app team delivers on their own. No, that doesn’t mean introducing big processes and standalone tools.
Here’s the “gotcha”: the biggest barrier to adopting a new release tool is the hassle of migrating. You likely have years of proven workflows documented in SharePoint/Confluence, in early-release management tools like XL Release, or in the fading memory of that one person who isn't allowed to retire.
Harness AI now handles the heavy lifting. Our AI Process Ingestion can instantly generate a comprehensive release process from a simple natural-language prompt, existing documentation, or export from a tool.
What used to take months of manual configuration now takes seconds. Simply put, we’re removing the friction of modernization.
For the Release Engineer, the goal is leverage. You shouldn't need to perform heroics every Friday night to ensure a successful release. (Though if you enjoy the adrenaline of a 2:00 AM war room, I suppose I can’t stop you.)
Harness Release Orchestration creates a standardized release motion that scales with AI-driven output. It allows you to move from being a "release waiter" to a "release architect."
AI made writing code cheap. Harness makes releasing it safe, scalable, and sustainable.


Engineering teams are generating more shippable code than ever before — and today, Harness is shipping five new capabilities designed to help teams release confidently. AI coding assistants lowered the barrier to writing software, and the volume of changes moving through delivery pipelines has grown accordingly. But the release process itself hasn't kept pace.
The evidence shows up in the data. In our 2026 State of DevOps Modernization Report, we surveyed 700 engineering teams about what AI-assisted development is actually doing to their delivery. The finding stands out: while 35% of the most active AI coding users are already releasing daily or more, those same teams have the highest rate of deployments needing remediation (22%) and the longest MTTR at 7.6 hours.
This is the velocity paradox: the faster teams can write code, the more pressure accumulates at the release, where the process hasn't changed nearly as much as the tooling that feeds it.
The AI Delivery Gap
What changed is well understood. For years, the bottleneck in software delivery was writing code. Developers couldn't produce changes fast enough to stress the release process. AI coding assistants changed that. Teams are now generating more change across more services, more frequently than before — but the tools for releasing that change are largely the same.
In the past, DevSecOps vendors built entire separate products to coordinate multi-team, multi-service releases. That made sense when CD pipelines were simpler. It doesn't make sense now. At AI speed, a separate tool means another context switch, another approval flow, and another human-in-the-loop at exactly the moment you need the system to move on its own.
The tools that help developers write code faster have created a delivery gap that only widens as adoption grows.
Today Harness is releasing five capabilities, all natively integrated into Continuous Delivery. Together, they cover the full arc of a modern release: coordinating changes across teams and services, verifying health in real time, managing schema changes alongside code, and progressively controlling feature exposure.
Release Orchestration replaces Slack threads, spreadsheets, and war-room calls that still coordinate most multi-team releases. Services and the teams supporting them move through shared orchestration logic with the same controls, gates, and sequence, so a release behaves like a system rather than a series of handoffs. And everything is seamlessly integrated with Harness Continuous Delivery, rather than in a separate tool.
AI-Powered Verification and Rollback connects to your existing observability stack, automatically identifies which signals matter for each release, and determines in real time whether a rollout should proceed, pause, or roll back. Most teams have rollback capability in theory. In practice it's an emergency procedure, not a routine one. Ancestry.com made it routine and saw a 50% reduction in overall production outages, with deployment-related incidents dropping significantly.
Database DevOps, now with Snowflake support, brings schema changes into the same pipeline as application code, so the two move together through the same controls with the same auditability. If a rollback is needed, the application and database schema can rollback together seamlessly. This matters especially for teams building AI applications on warehouse data, where schema changes are increasingly frequent and consequential.
Improved pipeline and policy support for feature flags and experimentation enables teams to deploy safely, and release progressively to the right users even though the number of releases is increasing due to AI-generated code. They can quickly measure impact on technical and business metrics, and stop or roll back when results are off track. All of this within a familiar Harness user interface they are already using for CI/CD.
Warehouse-Native Feature Management and Experimentation lets teams test features and measure business impact directly with data warehouses like Snowflake and Redshift, without ETL pipelines or shadow infrastructure. This way they can keep PII and behavioral data inside governed environments for compliance and security.
These aren't five separate features. They're one answer to one question: can we safely keep going at AI speed?
Traditional CD pipelines treat deployment as the finish line. The model Harness is building around treats it as one step in a longer sequence: application and database changes move through orchestrated pipelines together, verification checks real-time signals before a rollout continues, features are exposed progressively, and experiments measure actual business outcomes against governed data.
A release isn't complete when the pipeline finishes. It's complete when the system has confirmed the change is healthy, the exposure is intentional, and the outcome is understood.
That shift from deployment to verified outcome is what Harness customers say they need most. "AI has made it much easier to generate change, but that doesn't mean organizations are automatically better at releasing it," said Marc Pearce, Head of DevOps at Intelliflo. "Capabilities like these are exactly what teams need right now. The more you can standardize and automate that release motion, the more confidently you can scale."
The real shift here is operational. The work of coordinating a release today depends heavily on human judgment, informal communication, and organizational heroics. That worked when the volume of change was lower. As AI development accelerates, it's becoming the bottleneck.
The release process needs to become more standardized, more repeatable, and less dependent on any individual's ability to hold it together at the moment of deployment. Automation doesn't just make releases faster. It makes them more consistent, and consistency is what makes scaling safe.
For Ancestry.com, implementing Harness helped them achieve 99.9% uptime by cutting outages in half while accelerating deployment velocity threefold.
At Speedway Motors, progressive delivery and 20-second rollbacks enabled a move from biweekly releases to multiple deployments per day, with enough confidence to run five to 10 feature experiments per sprint.
AI made writing code cheap. Releasing that code safely, at scale, is still the hard part.
Harness Release Orchestration, AI-Powered Verification and Rollback, Database DevOps, Warehouse-Native Feature Management and Experimentation, and Improve Pipeline and Policy support for FME are available now. Learn more and book a demo.
Need more info? Contact Sales