
Key Takeaway: The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe.
AI agents are only as good as the context they operate in. That's not a design philosophy. It's a practical constraint. An AI agent that doesn't understand how the underlying software delivery entities relate to each other, or what the data actually means, will get things wrong. In software delivery, wrong looks like a botched deployment, a misread failure, or an approval granted when it shouldn't have been, which directly affects your users.
Today, we're announcing that the Harness MCP Server is in the official Claude Connectors Directory, making Harness discoverable and connectable for every team using Claude. But the announcement isn't really about the directory listing. It's about what Harness + Claude can actually do in your delivery system.

Claude can work across the full Harness delivery platform:

All of it is grounded in the Knowledge Graph, not raw API responses, but a structured model of your delivery system that Claude can reason over precisely.
MCP lets AI models call external tools by reading API descriptions and deciding which to invoke. That flexibility is useful. But when you're building an agent that needs to reason across an entire software delivery lifecycle, CI, CD, security scans, approvals, feature flags, cost signals, and environments, raw API access creates a deep reliability problem.
Consider a question a platform engineering lead might ask:
"Show me the pipelines with the highest failure rate over the last 30 days, and for each one, tell me which services they deploy and whether any of those services have open critical vulnerabilities."
That question spans four domains: pipeline execution history, service-to-pipeline relationships, environment state, and security scan results. An agent working off raw APIs has to discover which APIs exist across each domain, call them in the right order, paginate correctly, infer how field names correspond across systems, and synthesize the results without misinterpreting nested objects or guessing at relationships.
The result is 5+ sequential LLM calls, hundreds of thousands of input tokens, high latency, and an agent that had to guess at every join. Guessing is where hallucinations happen.
The Harness Software Delivery Knowledge Graph is a purpose-built model of everything that happens after code is written: builds, test runs, deployments, approvals, security scans, environment states, feature flags, infrastructure changes, cost signals, and rollbacks. Not as raw data but as a connected, typed, semantically annotated graph of entities and relationships.
Every field in the graph carries metadata that tells an agent exactly how to use it: whether a value is a number or a string, whether it can be aggregated or only filtered, what its unit is, and how it joins to related entities. Cross-module relationships, between a pipeline and the services it deploys, between a deployment and the security scan results for that artifact, between an environment change and the cost anomaly that followed, are explicitly declared, not inferred.
This is the difference between an agent that can access your delivery system and one that understands it.
When Claude connects to Harness via MCP, it doesn't receive a set of API endpoints. It's getting access to a structured model of your entire delivery organization, one where the relationships are known, the data types are enforced, and the agent can construct precise queries rather than guessing at field semantics.
The practical effect with Harness + Claude: that same cross-domain question above becomes 2–3 structured queries against a known schema. The agent selects the right entity types from the graph, generates queries with exact fields and declared relationships, and returns a deterministic answer. No guesswork. No hallucinated field names. No silent wrong answers.
A build has failed. Normally, you'd open the Harness UI, navigate to the execution, copy the relevant logs, paste them into a conversation, and wait for analysis. The AI reasons over whatever you managed to capture.
With the Harness MCP connection active in Claude, you ask what failed. Claude doesn't just pull logs; it queries the Knowledge Graph to understand the structure of that pipeline, which stage failed, what services were involved, whether similar failures have occurred before, and what changed since the last successful run. The answer it surfaces reflects the full delivery context, not just the stack trace you happened to copy.

Your team is ready to move a service from staging to production. Claude checks the current environment state, verifies that required approval gates have been satisfied, confirms the security scan passed for the artifact version you're promoting, and initiates the deployment — with every action running through your existing RBAC policies and logged for audit.
The agent isn't guessing about whether conditions are met. It's querying a graph where those conditions are modeled as typed relationships with known states. The answer is deterministic because the data is structured to make it so.
The natural question when Claude can trigger pipelines and manage deployments: what stops it from doing something it shouldn't?
The same controls that govern everything else in Harness. Every action taken through the MCP server runs through your existing RBAC permissions, OPA policy enforcement, approval gates, and audit logging. Claude operates with exactly the permissions you have, nothing more. Every action is tracked. Nothing bypasses the governance layer.
The Knowledge Graph reinforces this: because Harness AI understands your delivery system structurally, it also understands the constraints within it. Approval gates aren't just optional steps the agent might skip; they're modeled as typed relationships with state. The agent can't promote past a gate that hasn't cleared because the graph reflects that clearly.
Speed and governance aren't a tradeoff. They coexist by design.
The Claude Connectors Directory is a curated, reviewed set of integrations. Anthropic evaluates each server before listing it. Being approved is a signal of trust that carries weight for enterprise teams deciding which AI integrations to enable.
It also means discoverability at scale: engineering teams using Claude for DevOps workflows will find Harness natively. One-click OAuth connection, no API key management, no manual configuration.
This fits a broader pattern. The Google Cloud partnership brought Harness into Google's AI ecosystem through Vertex AI and Gemini CLI. The Cursor plugin brought it into the IDE. The Claude Connectors Directory brings it into conversational AI. In each case, the goal is the same: wherever developers are doing their best thinking and wherever AI is being asked to help with software delivery, Harness should be present with the right context for that AI to act reliably.
If you're already a Harness customer:
If you're new to Harness, sign up for free and connect from day one. Detailed steps are listed in the documentation.
The Harness Connector gives Claude the ability to act in your delivery system. The Knowledge Graph gives it the understanding to act well. Together, that's what reliable AI in software delivery actually looks like.

TLDR: Today, Harness is introducing the Harness Cursor Plugin, bringing the power of the Harness AI-native software delivery platform directly into Cursor. This integration, along with the Harness Secure AI Coding hook for Cursor, allows developers and AI agents to move from code changes to vulnerability detection, CI/CD execution, security validation, approvals, deployments, and operational insight without leaving the editor.
AI has completely changed how we write code. You can spin up functions, refactor entire files, and generate tests in seconds. The inner loop, writing and iterating on code, has never been faster. But the moment you try to ship that code, everything slows down. This is what we call the AI Velocity Paradox.
You are suddenly back to juggling pipelines, waiting on approvals, checking security scans, debugging failed runs, and bouncing between tools just to get a change into production.
That gap, between fast code and slow delivery, is what we kept running into. So we built something to fix it.
Today, we are introducing the Harness Plugin for Cursor, a way to go from PR to production without leaving your editor.
If you are using agentic coding tools, such as Cursor, you have probably felt this.
You can:
But shipping still depends on everything outside your editor:
And none of that got simpler just because AI showed up. In fact, AI makes the problem more obvious.
Now you can create changes faster than your delivery process can safely handle. And if those controls are not tight, you are introducing a whole new category of risk. Fast-moving code with fragmented governance.
AI did not break software delivery. It exposed how disconnected it already was.
Instead of jumping between tools, what if you could just tell your editor what you want to happen?
Something like:
“Deploy PR #4821 to staging once the security scan passes, and Slack me if anything fails.”
That is the idea behind the Harness Cursor Plugin.
It connects Cursor directly to Harness, so you can trigger and manage your entire delivery workflow using natural language, right inside Cursor.

No tab switching. No manual orchestration. No guessing what is happening in the pipeline.
Once connected, you can use Cursor to interact with your delivery system just as you do with your code.
For example, you can:

This builds on what we introduced last month, Secure AI Coding, which integrates directly with Cursor and scans code at the moment of generation rather than waiting for a PR review. Developers see inline vulnerability warnings with the option to send flagged code back to the agent for remediation, without leaving their workflow. Under the hood, it leverages Harness's Code Property Graph (CPG) to trace data flows across the entire codebase, surfacing complex vulnerabilities that simpler linting tools would miss.
The key thing is that you are no longer just interacting with code. You are interacting with the entire delivery system from the same place.
One of the biggest concerns with AI in delivery is obvious:
“Are we about to let agents push code to production without guardrails?”
No.
With Harness, everything runs through the controls that you can rely on:

Instead of being manual checkpoints spread across tools, they are enforced automatically as part of the workflow while you stay in flow.
So AI can help move things faster, but it cannot bypass the governance that matters.
Most integrations today expose APIs or bolt AI onto existing systems. That is not what we wanted to do.
We designed the Harness Cursor Plugin specifically for how AI agents actually work:
Because shipping software is not a single action. It is a chain of decisions across CI, CD, security, approvals, and operations. If AI is going to help here, it needs access to that full picture. That’s where the Harness Software Delivery Knowledge Graph comes into play. It provides the necessary context for AI to take actions for you.
The knowledge graph models the relationships between services, pipelines, environments, policies, and operational signals in real time. Instead of treating each step in delivery as an isolated task, it creates a connected system of record that AI can reason over. This allows agents to understand not just what to do, but when and why to do it, based on dependencies, risk signals, and historical behavior.

In practice, this means smarter automation: deployments that adapt to context, approvals that are triggered based on policy and impact, and faster root cause analysis because the system already understands how everything is connected.
This is not just about convenience. It is a shift in how software actually moves from idea to production.
Instead of:
You get a single, connected workflow:
All accessible from your editor. Cursor accelerates the building. Harness governs the shipping. And the handoff between the two disappears.
Watch the demo:
If you want to try it:
For example:
“Run the CI pipeline for this branch, check if the security scan passed, and promote to staging if it did.”
That is it.
AI is not just changing how we write code. It is changing expectations for how fast we should be able to ship it. But speed without control does not work in real environments. What we are building toward is something simpler:
A world where every step, from PR to production, is:
Without forcing developers to leave their flow. This plugin is one step in that direction.

“We’ve been operating in a hybrid environment with both OpenTofu and Terragrunt, and Harness has made it much easier to bring those workflows together into a single, consistent platform with IaCM. The addition of Terragrunt support is a valuable step toward simplifying how we manage infrastructure at scale.”
— Lead Platform Engineer, Enterprise Customer
Infrastructure as Code is now a standard for modern cloud operations, with most enterprises using IaC to provision and manage environments. However, as adoption grows, so does complexity. Teams are no longer managing a handful of environments. They are operating across multiple regions, accounts, and services, often at massive scale.
This is where traditional approaches begin to fall short.
As organizations scale their infrastructure, Terraform alone is often not enough. Teams adopt Terragrunt to manage complex, multi-environment deployments, but they are often forced to stitch together fragmented tooling that lacks visibility, governance, and consistency.
At Harness, we are changing that.
Today, we are excited to announce native Terragrunt support in Harness IaCM, bringing it to full parity with Terraform and OpenTofu while delivering capabilities that go beyond what is available in standalone tooling. This is more than support. It is about making Terragrunt a first-class platform for enterprise infrastructure management.
With Harness IaCM, teams can now:

Terragrunt has become a critical layer for managing infrastructure at scale because it simplifies how teams structure and reuse configurations across environments. Harness builds on that foundation with deep, native integration, enabling platform teams to operate with both flexibility and control.
This is especially important for enterprises where a single deployment spans multiple environments and services. Harness abstracts that complexity while maintaining governance, auditability, and consistency.
Terragrunt is part of a broader shift toward multi-tool infrastructure strategies.
Modern teams are no longer standardized on a single IaC tool. Instead, they operate across:

This creates challenges around consistency, visibility, and governance. Harness IaCM is built for this reality. We are evolving IaCM into a unified control plane for multi-IaC workflows, where teams can manage different frameworks with a consistent experience, shared policies, and centralized visibility.
This means:
Instead of managing infrastructure in silos, teams can now operate from a single platform across the entire lifecycle.
The next phase of Infrastructure as Code is not just about supporting more tools. It is about making infrastructure systems more intelligent and automated.
We are investing in two key areas:
We are continuing to support modern frameworks like AWS CDK, enabling developer-centric infrastructure workflows alongside provisioning, configuration, and orchestration tools.
We are introducing intelligence into IaC workflows to simplify tasks such as drift management and optimization. This helps teams reduce manual effort and operate more efficiently at scale.
Together, these investments move IaCM toward a unified, multi-IaC platform that combines flexibility, governance, and automation. Terragrunt has become essential for managing infrastructure at scale but until now, it hasn’t had a platform that truly supports it. As infrastructure continues to grow in complexity, our focus remains the same. Helping teams move faster, reduce risk, and scale with confidence no matter which IaC tools they use.


Human review, and AI review, can only get you so far
Let's be frank: the last few years in software engineering have been earth-shattering. The foundations of the discipline have changed. Code can be written, rewritten, tested, and shipped faster than ever before. Agents are burning through trillions of tokens, and every month they get better at turning vague intent into working software.
That is exciting. It is also destabilizing.
Many teams are still built around the assumption that every meaningful change can be understood by a human before it merges. A developer opens a pull request, a reviewer reads it, a test suite runs, and the team decides whether the change is safe enough to deploy.
That model was already under pressure before AI, but now it is breaking.
LLMs can produce code far faster than any team can review it. The volume problem is obvious: if one engineer with an agent can generate several times more change than before, the review queue grows faster than the organization can absorb. The harder problem is trust. Even when a change looks reasonable, and even when another model reviews it, the system still cannot guarantee the behavior of that change in production.
AI review does not eliminate this problem. You can ask a different model, use a different prompt, or build an entire agentic code-review workflow. That can catch real issues. It can improve consistency. It can reduce the burden on humans. But it is still a non-deterministic system evaluating the output of another non-deterministic system. It can tell you what looks wrong. It cannot prove that a change will not degrade production.
Even staging and QA only get you so far. A non-production environment is not, and cannot be, exactly the same as production. It will not have the same traffic shape, data distribution, customer behavior, integrations, timing, scale, noisy neighbors, or failure modes. The closer you make it, the more useful it becomes, but it is still a model of production. It is not production.
So the question is not, "How do we review everything perfectly?"
The better question is, "How do we release in a way that assumes review is imperfect?"
Would you believe that one of the best answers to this problem has existed for a long time?
In December 2009, Flickr published an unassuming engineering post called Flipping Out. The idea was simple: release new features without deploying new code for every feature launch. Flickr described a model where code was merged continuously, deployed from the main branch, and gated behind small runtime switches. A feature could exist in production but remain unavailable until a configuration value flipped it on.
At first, that may not seem directly related to AI-generated code. But follow the thread.
What Flickr was describing is what we now call feature flagging. Combined with trunk-based development, feature flags let teams deploy code continuously without releasing every behavior immediately. The key distinction is simple but profound: deployment and release are not the same thing.
Deployment is getting code into an environment.
Release is exposing behavior to users.
Those two actions are often treated as one event, but they do not have to be. Feature flags are a way to choose between code paths at runtime and explicitly decouple deployment from release. With AI-accelerated engineering, that separation becomes a basic safety requirement.
If AI can generate more changes than humans can manually reason through, then the release system has to become more empirical. It has to answer: what is this feature actually doing to real users, real systems, and real business metrics?
Hiding unfinished work behind if statements is only the beginning. The real value is controlled exposure. A feature can be deployed to production, then released first to internal testers. Then to one percent of users. Then five. Then ten. At every step, you observe the impact before deciding whether to continue.
Production is where the unknowns live. Your tests can tell you whether the code behaves as expected in known scenarios. Your reviewers can tell you whether the change looks reasonable. Your static analysis tools can tell you whether it violates known rules. But only production can show you whether the change behaves well under the messy reality of actual usage.
Most teams already have observability. They have dashboards, logs, traces, alerts, and APM tools. You still need all of that, but aggregate system health is a blunt instrument when the risk is tied to one feature in a partial rollout.
APM tools are usually excellent at telling you something changed in the system. They are much less reliable at telling you which feature caused the change, especially during progressive delivery.
Imagine an AI-generated change increases crash rate by 10 percent for users who receive it. If that feature is only enabled for five percent of traffic, the total crash rate across the whole application may move by only half a percent. That can look like noise. It may not page anyone. It may not even be visible until the rollout expands to 20, 30, or 50 percent of traffic.
Harness FME Release Monitoring is designed around that gap. Rather than looking only at aggregate platform health, Release Monitoring measures the impact of feature flags and experiments on performance and behavioral metrics. If multiple features are rolling out at once, you do not want to know only that the application got worse. You want to know which feature is responsible, which users saw it, and which metric moved.
Code review does not go away. Human review still matters. AI review still helps. Tests still matter. Security scanning still matters. Production metrics add the control those systems cannot provide on their own: measured impact.
In Harness FME, metrics evaluate the impact of feature flags and experiments on user behavior and system performance. They can measure errors, conversions, page load performance, interactions, satisfaction, sessions, shopping cart behavior, and any other event stream that matters to the product.
"Safe" is not a purely technical word. Depending on the feature, safety might mean error rates stay flat, page loads do not slow down, conversion does not drop, support tickets do not spike, or customers do not start rage clicking their way through a broken flow.
The right guardrails depend on the feature. Engineering leadership may care about latency and error rate. Product leadership may care about adoption and retention. Support may care about ticket volume. The power of a metric-driven release process is that all of those concerns can be defined before the rollout, measured during the rollout, and used to decide whether the feature keeps moving forward.
That changes the AI conversation. Reviewers are no longer being asked to predict every possible effect of a change from the diff alone. The release system is responsible for measuring the effects that actually matter.
Once metrics are attached to a rollout, the next step is automation.
Harness FME alerts and monitoring can notify teams when metrics cross critical thresholds or when statistically significant impact is detected on key or guardrail metrics. If the impact is negative, the team can stop the rollout, kill the flag, and investigate with a much narrower blast radius than a traditional deploy-and-pray release.
The operational model starts to look different:
That loop is much more realistic for the AI era than pretending review can scale linearly with code generation.
With FME pipelines, this can also become part of the delivery workflow itself. Harness pipelines can include FME steps for operations like creating or updating feature flags, changing rollout behavior, modifying targets, setting default allocations, and killing a flag. Feature release can move from an ad hoc manual process to an auditable automation path.
AI velocity does not need chaos with better dashboards. It needs disciplined automation with measurable gates.
Software engineering has changed permanently. The amount of code that can be produced by a small team is going up. The number of ideas that can be prototyped is going up. The number of changes waiting to be reviewed, validated, merged, and released is also going up.
But some things have not changed.
Production is still the only environment that is truly production. Users still behave in ways you did not predict. Distributed systems still fail in ways your test plan did not imagine. Business metrics still matter more than whether the diff looked elegant.
So yes, keep reviewing code. Use AI reviewers where they help. Keep improving tests. Keep scanning for vulnerabilities. Keep investing in non-production environments.
None of that is proof by itself.
When features are being written faster than humans can comprehensively review them, the release process has to become empirical. Put the code behind a flag. Release it progressively. Measure the impact per feature. Alert on guardrails. Kill the feature when the data says it is hurting users.
In the age of the LLM, the proof is in production.


Infrastructure provisioning is no longer the hard part.
Most engineering organizations have already standardized on Infrastructure as Code (IaC), GitOps workflows, Terraform or OpenTofu, and CI/CD pipelines. Provisioning cloud infrastructure has become relatively repeatable.
But operating infrastructure at scale remains deeply fragmented.
That’s the tension platform engineering teams are now dealing with: infrastructure doesn’t typically fail during provisioning anymore because it fails after deployment through drift, inconsistent runtime configuration, policy violations, and unmanaged operational changes.
As cloud environments become more dynamic, traditional infrastructure automation models are showing their limits.
During the recent Harness webinar Designing a Control Plane for Cloud Infrastructure, Rohit, Product Manager for ICM at Harness, and Mrinalini Sugosh, Product Marketing Manager at Harness, outlined why platform teams are shifting from static provisioning workflows toward continuous infrastructure control. That shift fundamentally changes how platform engineering teams need to think about governance, self-service, and infrastructure operations.
The industry has spent the last decade solving infrastructure provisioning.
Terraform, OpenTofu, GitOps workflows, CI/CD automation, and cloud-native APIs dramatically improved infrastructure consistency and repeatability. Most teams can now provision infrastructure reliably through declarative workflows.
But provisioning is only one moment in the infrastructure lifecycle.
Modern environments continuously change:
That distinction matters because most IaC pipelines still operate like transactional systems:
The problem is that cloud infrastructure does not remain static after deployment.
Traditional infrastructure workflows validate infrastructure at a single point in time. Modern infrastructure requires continuous observation and enforcement.
Infrastructure drift is no longer an edge case.
It’s the default operating condition for most large-scale cloud environments.
A developer updates a security group directly in AWS during an incident. An engineer modifies a Kubernetes runtime configuration outside GitOps. A platform team upgrades infrastructure dependencies manually to unblock production.
The infrastructure technically “works,” but the declared state and actual state no longer match.
Over time, that creates:
Rohit described this reality during the webinar as the “glass break” problem:
“In incident scenarios, the instinct is to fix things with ClickOps is the easiest way possible, which leads to drift. If not remediated, after the incident.”
Most organizations attempt to solve this operationally through:
But fragmented tooling compounds the problem.
Infrastructure provisioning, runtime configuration, deployment workflows, security scanning, and self-service portals often evolve independently. Each layer introduces its own operational logic, approval models, and governance controls.
Eventually, the platform itself becomes the source of complexity.
A control plane changes the operating model.
Instead of treating infrastructure governance as a one-time validation step, platform teams move toward continuous governance:
This is the difference between infrastructure automation and infrastructure operations.
According to the webinar speakers, modern control planes are designed to unify several traditionally disconnected functions into a single operational layer, including infrastructure provisioning, runtime configuration management, policy enforcement, cost governance, drift detection, security scanning, self-service infrastructure workflows, and deployment orchestration. The major architectural shift is that governance is no longer treated as a separate overlay added after deployment, but instead becomes embedded directly into the system itself, including at the design stage.
This approach enables organizations to enforce controls such as blocking unsupported OpenTofu versions, preventing GPU provisioning in development environments, enforcing tagging standards, validating security posture before provisioning, and surfacing projected infrastructure cost changes during approval workflows. As Rohit explained, “You want these gates as part of the release process rather than as an afterthought in production.” This philosophy aligns closely with modern platform engineering models, where governance is automated, centralized, and reusable across teams and environments.
Most enterprises still manage infrastructure provisioning and runtime configuration through separate operational systems. Infrastructure is commonly provisioned with Terraform, runtime environments are configured with Ansible, deployments are managed through CI/CD pipelines, and security tooling operates independently from the rest of the delivery process. This fragmented approach creates operational silos, duplicate governance workflows, policy inconsistencies, fragile integrations, and significant platform maintenance overhead.
Modern control planes address this problem by consolidating these functions into a unified operational model. During the webinar, Harness demonstrated how OpenTofu and Terraform provisioning, Ansible configuration management, CI/CD orchestration, security scanning, approval workflows, cost visibility, and drift monitoring can all operate within a single system. By reducing the amount of platform “wiring” required between tools, organizations can establish more consistent governance patterns across the entire software delivery lifecycle while simplifying operational management.
This approach also aligns with broader trends in continuous testing in CI/CD, AI-driven software delivery, and GitOps deployment automation, where operational consistency and automation become foundational platform capabilities.
Governance at scale cannot rely on tribal knowledge or manual review processes. High-performing platform engineering teams operationalize governance through reusable policies, standardized templates, and inheritance-based control models that can be applied consistently across environments and teams.
The webinar highlighted several examples of this model in practice, including OPA policy enforcement at the account, organization, and project levels, design-time validation before provisioning, embedded security scanning with tools such as Checkov, approval gates enriched with cost and compliance data, and reusable “golden provisioning pipelines.” These capabilities demonstrate how governance can be integrated directly into platform workflows instead of being treated as a separate operational layer.
Manual governance processes do not scale effectively in modern infrastructure environments. Policy-as-code approaches allow platform teams to standardize controls globally while still preserving flexibility for individual development teams. This reduces approval bottlenecks, accelerates compliance workflows, and increases developer autonomy without compromising security or operational consistency.
Well-designed guardrails often improve delivery speed rather than slowing it down because developers can operate within predefined safe boundaries. This principle has become central to modern platform engineering, where governance is designed to be automated, centralized, and reusable across the organization.
Many infrastructure as code systems still approach drift detection reactively, and in some environments, drift may go undetected entirely. Modern control planes instead provide continuous monitoring of infrastructure state and compare deployed resources against declared configurations in real time.
Harness demonstrated several capabilities designed to improve operational visibility and auditability, including full infrastructure state version history, attribute-level drift visibility, continuous monitoring for external configuration changes, and historical comparisons across versions. These features help platform teams identify configuration deviations earlier while also improving traceability during incident investigations and operational reviews.
More importantly, continuous drift monitoring enables organizations to move toward proactive remediation models rather than depending entirely on manual operational intervention. As infrastructure environments continue to scale, automated drift detection and remediation are becoming increasingly important because manual review processes cannot keep pace with the volume and complexity of modern cloud infrastructure.
Self-service infrastructure without governance often leads to uncontrolled infrastructure sprawl, which is one reason many Internal Developer Portal initiatives struggle after initial adoption. Exposing powerful infrastructure capabilities without consistent operational guardrails can create additional complexity instead of improving developer productivity.
Modern platform engineering requires organizations to balance several competing priorities simultaneously, including developer autonomy, operational consistency, security requirements, cost governance, and compliance enforcement. The most effective platform teams solve this challenge through standardized operational patterns such as golden templates, centralized policy inheritance, reusable provisioning pipelines, embedded approval workflows, standardized workflows, and carefully controlled abstractions.
This model allows developers to provision and manage infrastructure independently while still operating within safe and compliant boundaries. By embedding governance directly into self-service workflows, organizations can improve developer experience without requiring every engineering team to develop deep expertise in the underlying complexity of cloud infrastructure and platform operations.
Infrastructure automation solved provisioning.
Platform engineering now needs to solve operations.
That requires shifting from:
The control plane model reflects that evolution.
It’s not simply another IaC orchestration layer.
It’s an operational framework for continuously governing infrastructure delivery across provisioning, configuration, deployment, security, and self-service systems.
As infrastructure complexity grows, this architectural shift is becoming less optional.
It’s becoming foundational to how modern platform engineering organizations operate at scale.
An infrastructure control plane is a centralized operational system that continuously manages provisioning, governance, policy enforcement, drift detection, and infrastructure lifecycle workflows across cloud environments.
Infrastructure as Code defines desired infrastructure state. A control plane continuously observes, governs, validates, and operationalizes infrastructure after deployment.
Drift creates inconsistencies between declared infrastructure and actual runtime environments, increasing security risk, operational instability, audit failures, and troubleshooting complexity.
Platform engineering teams create standardized workflows, templates, guardrails, and self-service systems that allow developers to provision infrastructure safely and consistently.
Control planes provide reusable templates, embedded governance, and policy enforcement that allow developers to self-service infrastructure without introducing operational risk.
Golden paths are standardized workflows, templates, and operational patterns that simplify software delivery while enforcing security, governance, and operational best practices.
Without governance, self-service platforms can increase infrastructure sprawl, security gaps, and operational inconsistency by exposing powerful infrastructure workflows without guardrails.
Harness combines Infrastructure as Code Management (IaCM), Internal Developer Portals (IDP), CI/CD, governance, security scanning, and drift detection into a unified software delivery platform.
Cloud infrastructure has evolved far beyond static provisioning workflows, making infrastructure deployment alone insufficient for maintaining governance, operational consistency, security, and reliability at scale. Modern platform engineering teams require systems that continuously observe infrastructure state, enforce policies, validate configurations, detect drift, and operationalize governance throughout the entire infrastructure lifecycle rather than only during deployment events. This shift is driving the emergence of infrastructure control planes as a foundational operating model for modern platform teams. By embedding governance, automation, visibility, and self-service capabilities directly into infrastructure workflows, organizations can improve developer autonomy while maintaining centralized operational control. Solutions such as Harness Infrastructure as Code Management and Internal Developer Portal capabilities are designed to help platform teams operationalize continuous governance, proactive drift detection, and scalable self-service infrastructure delivery across increasingly complex cloud environments.


Let's face it: "move fast and break things" is a great way to end up sitting in a war room at 3:00 AM. Engineer burnout is at record highs, we don’t need sloppiness to hurt us further.
Look. Here’s the reality: thanks to AI code generation tools, we are writing more code than ever before. Delivering that with pipelines built for human-speed development? That’s become the chokepoint. Everything in delivery needs to get faster and better. That includes governance.
We’ve long used Open Policy Agent (OPA) to embed automated governance directly into delivery pipelines to stop teams from cutting corners. OPA is Policy as Code and by default evaluates on our secure cloud infrastructure. But for large, highly regulated enterprises, corporate firewalls and strict data residency rules present a classic dilemma:
What happens when a policy needs to access data that resides within a corporate firewall? How do we run these policies so that they connect to internal systems securely and access that data within the corporate trust boundary?
We’re tackling that challenge now. New to Harness is the ability to evaluate OPA Policies on Local Infrastructure.
Platform and security engineering teams love OPA because it allows them to gate pipelines based on real-time business logic. For example, you may want to implement a waiver or exceptions workflow that grants a one-time exception to a specific Policy from being broken. And you may want to track that a waiver was issued in a ticketing system like ServiceNow.
However, executing this evaluation in a standard SaaS model breaks down when:
Historically, teams had to choose between drilling holes in their firewall, duplicating infrastructure, or reverting to manual spreadsheets and agonizing verification meetings.
With this new capability, Harness lets you direct the OPA evaluation engine to run in your own environment (specifically on your local Kubernetes clusters).
Instead of pulling your secure internal metrics out to the cloud for policy validation, Harness sends the evaluation intent down to your local cluster. The evaluation triggers locally, pulls secrets natively from your secure environment, queries your private behind-the-firewall tools, and passes a simple, immutable Pass/Fail status back to the Harness pipeline

This approach delivers the best of both worlds: the ease and scalability of a unified platform control plane, backed by the absolute security of local execution
Consider a classic enterprise scenario: gating a production deployment based on an internal ticketing system.
If the ticket is approved, the sync proceeds automatically. If the ticket is canceled, pending, or in an unexpected state, the pipeline halts or triggers an automated rollback strategy before any risk is introduced to production. Because the execution stays within your perimeter, your ticketing credentials remain entirely untouched by external systems.
Check out this quick demo video to see exactly how to configure your Kubernetes cluster to handle OPA evaluations locally:
A common pattern we saw amongst our customers was they wanted an “exceptions” or “waiver” workflow where customers, for certain use cases, could waive a failed OPA policy for a particular scenario. Let’s take the following example:
In these kinds of situations, teams often want some kind of mechanism to allow a waiver where they allow the pipeline to run this one specific time due to special circumstances. Additionally, customers want to keep track that a waiver was issued in a third-party ticketing system (like JIRA or ServiceNow). With the Local OPA evaluations capability, you can now write policies that query the internal ticketing system as shown above.
Another common authorization workflow we saw was customers trying to ensure that their pipeline YAMLs hadn’t been tampered with. For example, customers often want to ensure that the pipeline they have authored and stored in Harness SaaS is exactly the one that runs at the time of deployment. They want to ensure that no third party tampers with the pipeline YAML before it is actually being run. The approach we saw customers take was the following:
The steps outlined above allow for ensuring that nobody has tampered with the pipeline’s yaml before it is run. However, to write a rego policy that can actually do a hash code equivalence check (step 4) you need to make a call to the internal database system where the hash code of the correct pipeline lives. This again necessitated having the rego policy read credentials and connect to a 3rd party system. Again, one way to solve this problem was to allow customers to run these OPA policies on their own K8s clusters.
Finally, some customers use our custom policy step action to perform an authorization check midway through a pipeline. For several of these situations, customers want to send data for the OPA policy to check that is sensitive in nature. For such use cases, they don’t want the sensitive payload to be sent to the OPA service running in Harness SaaS. Instead they want the payload to be sent to the OPA rego policy running in their own infrastructure.
So, what does this mean for your daily operations?
The beauty of local OPA evaluation is that your developers won't notice a single change in their daily workflow. They continue to leverage the fastest builds and automated continuous delivery pipelines they love.
Meanwhile, Platform Leaders gain a comprehensive, immutable audit trail of every single evaluation, ensuring painless compliance reviews without hampering developer velocity.
Ready to eliminate toolchain chaos and secure your deployment guardrails? Get started with Harness Continuous Delivery & GitOps today.


The Shai-Hulud lineage has a new face. On June 1, 2026, security teams independently flagged a fresh supply chain compromise inside the @redhat-cloud-services npm namespace. 32 packages and 96 versions were all republished with a credential-stealing worm.
These aren't typosquats. They are the official packages in a trusted scope, pulling somewhere 80,000-117,000 average weekly downloads. This article walks through how one compromised maintainer account turned Red Hat's own CI/CD pipeline into a malware channel, what is actually new under the hood versus earlier Shai-Hulud waves, and how to clean it up without tripping the worm's self-destruct.
Open-source ecosystems run on trust and most of that trust is now automated. A modern build pulls hundreds of transitive dependencies, publishes through CI/CD with nobody watching and checks provenance to prove an artifact came from where it claims. Provenance can tell you where a package was built but can't tell you if the build environment was clean.
“Miasma is what happens when an attacker stops trying to fake that trust signal and just earns it from inside a pipeline that already has it.”
Miasma is a multi-stage dropper. It runs during npm installation, scans the machine and any reachable cloud for credentials, then republishes itself through every package the stolen tokens can reach. It's a direct descendant of the Mini Shai-Hulud worm. What changed is the packaging: the wrapping, the staging, and the disguise. Where Shai-Hulud used Dune references, Miasma switches to Greek mythology hence naming things "spartan" and labeling its exfiltration repos Miasma: The Spreading Blight.
Here's what actually separates this wave from earlier Shai-Hulud activity:
This wasn't a stolen token push. It happened inside Red Hat's own release infrastructure. A Red Hat employee's GitHub account was taken over and used to commit straight into internal repositories hence skipping the code review step entirely. Here's how it played out:

Red Hat publishes free software building blocks (called "packages") that thousands of other developers download and use in their own apps. An attacker found a way to poison those building blocks so that anyone who downloaded them would get secretly hacked. This kind of attack is called a "supply chain attack." Instead of breaking into your house, they forge the lock before it ever reaches the store.
The attacker didn't steal a password or a key. They hijacked a Red Hat employee's GitHub account and quietly slipped their own code into Red Hat's project. Normally, any code change gets reviewed by another human first. But they used a sneaky trick of Orphan commits that let the changes bypass that review making sure that nobody saw them go in.
The industry recently moved to a system where instead of using permanent passwords to publish software, the publishing system hands out temporary and single-use permission slips (short lived tokens).
The idea was "no permanent password to steal means safer."
But the attacker had taken over the machine that creates those permission slips. So every poisoned package came stamped with a legitimate "this was built by a trusted system" seal of approval which was technically true and completely useless because the trusted system itself was compromised.
They rigged the poisoned packages so the malicious code runs the instant you install them, before you can read the code or before anything looks wrong. One giant red flag they point out: one of the infected packages was supposed to contain only text definitions (no programs at all), yet it was set up to run a program on install. That's like a sealed envelope that somehow starts ticking.
The actual malicious code was buried under layers of disguise. It was scrambled, encrypted and rebuilt from lists of numbers, specifically to fool automated security scanners. It also quietly downloads its own tools if your machine doesn't already have them to make sure that it works on almost any computer.
Once running, it grabs everything it can. It reads environment variables, host details, and local credential files, pulls GitHub CLI tokens with gh auth token and scans the filesystem for secrets that match known patterns. It doesn't stop at files on disk. If it has a valid identity, it queries cloud metadata services, reads from AWS Secrets Manager and SSM Parameter Store, pulls Azure Key Vault and GCP Secret Manager values and lists Kubernetes and Vault secrets. On CI runners it can even read secrets out of the runner's memory, which gets around log masking because the secret is never written to a log.
Representative token patterns searched by the payload include:

To smuggle the stolen secrets out without setting off alarms, the malware sent data to a web address that looks normal. The full address is hxxps[:]//api[.]anthropic[.]com/v1/api, which is a real Anthropic host. A plain GET to it returns Anthropic's normal 404 not_found_error, so /v1/api isn't a real route and Anthropic's systems were not compromised. The point is to cover. The domain looks harmless in network logs and the path looks like an API call. It's also awkward to block, since lots of companies legitimately call Anthropic.

The malware reuses the same "GitHub dead-drop" trick from earlier Shai-Hulud versions. If it finds a working GitHub token, it uses it to create a public repo on the victim's account and saves stolen data there as JSON files (under a results/ folder, named with a timestamp and counter). The repo gets a random name in the form adjective-noun-number and its description is set to a fixed string.
“Miasma: The Spreading Blight”

When the payload includes a stolen token in a commit message, it uses the threat marker:
IfYouInvalidateThisTokenItWillNukeTheComputerOfTheOwner
Why it spreads like a worm: This is the nastiest part. When the malware finds credentials that can publish software, it infects those packages too and republishes them so the infection jumps from victim to victim automatically similar to the way a real worm or virus spreads. Researchers found it in over 200 infected projects. It also has multiple hidden backup copies of itself buried around GitHub so even if you clean one up it is designed to crawl back.
The names of some of the affected packages are:
The single biggest takeaway for a normal developer: be suspicious when installing a package triggers programs to run, especially a package that has no business running anything. That `preinstall` behavior was the whole foundation of the attack.
Because of the dead-man switch, sequence is the whole game. Work through these in order:
Harness SCS helps you quickly detect and contain compromised dependencies like the redhat-cloud-services package before they impact your pipelines. With real-time visibility into your SBOMs and dependency graph, you can identify affected versions, trace their usage across builds and environments and block them using OPA policies. This ensures malicious packages never propagate through your CI/CD or AI workflows.
Harness SCS enables instant search across all repositories and artifacts to quickly identify if compromised package versions exist in your environment. The moment such a malicious package is disclosed, you can pinpoint its presence and assess impact across your entire supply chain in seconds.

Harness AI streamlines response to incidents like the redhat-cloud-services package compromise through simple natural-language prompts. With a single prompt, you can generate OPA policies to block affected versions of redhat-cloud-services packages, for example, across all pipelines, preventing malicious packages from entering builds or deployments. As new compromised versions emerge, these policies can be quickly updated to maintain strong preventive controls across your SDLC.
Harness SCS automatically detects compromised versions across both production and non-production environments. Teams can track remediation, assign fixes and monitor progress through to deployment, ensuring exposed credentials and vulnerable dependencies are addressed quickly. This end-to-end visibility helps contain the impact and prevents compromised packages from persisting in your supply chain.

The Mini Shai-Hulud worm highlights how quickly a malicious package can expose high-value secrets when embedded deep within registries and CI runners. Given its role in managing dependencies and packages across projects, the impact extends beyond code to API keys, prompt data and downstream systems, often bypassing traditional security checks.
Defending against such attacks requires more than reactive fixes. Teams need real-time visibility into dependencies, the ability to enforce policies to block compromised versions and continuous tracking to ensure remediation is complete across all environments. Harness SCS enables teams to quickly identify where affected package versions are used, prevent them from entering new builds and ensure fixes are consistently rolled out.
With these controls in place, organizations can limit credential exposure, contain threats early and secure their supply chain against attacks like the redhat-cloud-services compromise.
.png)
.png)
AI coding tools promise faster development. What they don't show you is the queue forming at the pipeline, the security scanner you bypassed to stay fast, or the cost dashboard with a line now labeled "unknown" that is steadily growing. In May, we shipped 60+ features in 31 days across the entire delivery system: not just the editor, but everything downstream of it.
Software Delivery Intelligence, Now Inside Claude (Code and Desktop)
The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe.

The MCP Server in May: From Early Access to Production-Ready
Our MCP Server is evolving fast! Seven releases across 31 days. The month started with control and safety work: configurable autonomy levels, per-session trust boundaries, human-in-the-loop execution waits, six CVEs patched, and guardrails around destructive operations. It ended with expanded reach: IaCM workspaces, full DBSchema CRUD for database operations, Ansible support, and GPT app readiness with structured output and tool annotations. If you are building agentic pipelines on top of Harness, or want your AI coding assistant to drive deployments, infrastructure changes, and database schemas without leaving the IDE, this is the server to connect to. Read the docs.
Skills Library
A curated library of skills distills common prompt patterns from internal usage into structured instruction files. The library includes security-specific skills and is packaged for use with the MCP Server, Claude Code, Cursor, and GitHub Copilot. The model follows the skill; the engineer describes what they want. Read the docs.

Google Code Wiki and Deepwiki Integration
The Harness MCP server is now indexed by Google Code Wiki and Deepwiki (Cognition/Windsurf). Devin and Windsurf users can analyze the MCP server architecture and ask questions about it directly. The Code Wiki updates automatically from commits.
AI APIs, MCP tools, and models are now first-class assets in the platform, not afterthoughts in a traditional API inventory.
Sensitive Data Detection in AI Prompts and Responses
Open any discovered AI API from the AI Assets inventory and see what sensitive data is being processed in prompts and model responses. Exposure trends, data locations, and classifications are surfaced inline. This identifies high-risk AI APIs based on actual runtime behavior, not how they are configured. Learn more.

Service, MCP Server, and Environment on Issue Details
Issue Details now surfaces exactly where an issue is occurring: which service, which MCP server, and which environment, without leaving the side sheet. Previously, pinpointing issue context required navigating across views.
Span Attributes for Live Traffic Policy Scoping
Live traffic policies now evaluate only spans that match specific attributes, such as HTTP status codes. Detections are contextual rather than applied universally to all traffic. The evidence in each detection shows which spans actually triggered it. Docs
UI for Span-Attribute-Based API Exclusion Rules
Define API exclusion rules based on span attributes directly in the UI. Select status codes or specific headers to exclude APIs from discovery, giving precise control over what appears in the API inventory.
Entity Derivation for Bot and Abuse Protection
Extract, transform, and standardize application-specific attributes from API traffic and use them in Bot and Abuse Protection policies. Previously, detection rules were limited to predefined attributes. Custom entities derived from traffic patterns can now feed directly into policy evaluation. Docs
Rule Evaluation Point Support in Exclusion Policies
Configurable rule evaluation behavior for exclusion rules enables exclusions to be applied based on your deployment model, whether through a tracing agent or Traceable Edge. Docs
Granular RBAC and Environment-Level Scoping
Environment-level scoping now covers APIs, policies, configurations, and security insights consistently across the platform. Access is restricted to authorized environments, and policy management is environment-aware. Docs
Keyless Artifact Signing
Sign and verify artifacts without managing long-lived cryptographic keys. Identity-based authentication replaces key management, eliminating the rotation burden that makes key-based signing operationally painful at scale. Docs

License Family Classification for SBOM
SBOM components are now automatically grouped by license family. Teams get a portfolio-level view of open-source license risk without reviewing individual component licenses one by one. Docs.

Typosquatting and Malicious Package Detection
Two new risk signals are now checked during OSS dependency scanning: packages named to look like popular libraries (typosquatting) and known malicious packages. Added to the existing supply chain risk checks. Docs

Flaky Test Detection (Beta)
Test Intelligence now identifies tests that pass and fail intermittently without consistent code changes as the cause. Flaky tests can be quarantined, removing them from pipeline gate decisions while tracking their instability over time. Previously, flaky tests failed pipelines with no actionable root cause. Read the docs.
Docker Connector Support for Custom Build Images
Bring Your Own Image (BYOI) workflows in Harness Cloud now support Docker connectors pointing to private registries. Teams with custom build container images hosted in private registries can use them for Harness Cloud builds without pushing to a public registry first. Release notes
Network Egress Restrictions in UI
Configure egress allow lists for Harness Cloud Linux and Windows build VMs directly from the Harness UI. Previously required manual configuration outside the product.
Test Splitting Accuracy
Test Intelligence now uses historical average durations for more balanced test parallelism. The split_tests binary previously required timing data in a specific format; it now also supports average-based timing, making accurate splitting available to more test suites.
Connector validation tasks and SCM tasks for proxy-enabled connectors are now routed through Harness Cloud delegates, ensuring both validation and source code operations work correctly for PrivateLink setups. These are behind feature flags.
OIDC Delegate Selectors for AWS
Pass delegate selector information as AWS session tags in OIDC tokens. IAM policies can now restrict which Harness delegates execute which tasks, providing environment-level secret isolation without relying on environment naming conventions. Works across connector validation, deployment stages, and custom stages. Release notes
Dry Run Validation API
A new API endpoint validates pipeline YAML changes before they are committed to Git. Runs schema validation, template expansion, and OPA policy evaluation without executing the pipeline. Useful for pre-commit checks in IDEs or CI gates on pipeline repositories.
Soft Delete for Packages
Deleting a package or version now moves it to a recoverable state rather than removing it immediately. Teams that accidentally delete an artifact a running deployment still depends on can recover it before anything breaks. Permanent deletion is available from the same dialog when that is the intent.
Swift and Raw Package Support
Two new formats are now supported. Swift packages work with full SwiftPM compatibility: authenticate, publish, and resolve dependencies using the registry URL with no changes to existing workflows. Raw artifact storage handles arbitrary files by path: binaries, archives, reports, configuration files, anything that does not belong to a package manager ecosystem.
Dependency Firewall: Exemptions and Notifications
The Dependency Firewall now supports exemptions and policy action notifications. Whitelist trusted dependencies that should bypass firewall rules, and configure alerts that fire when the firewall blocks or flags a package. Teams get granular control over what gets blocked without having to audit the firewall log manually to know when it acted.
Audit Dashboard for Package Uploads and Downloads
A new dashboard records every package upload and download across all registries with full attribution: who performed the action, when, and on which package and version. Provisioned automatically for accounts with Artifact Registry enabled. Useful for compliance reviews, security investigations, and understanding artifact consumption patterns across teams. Release notes
Harness Code Repositories as a schema source
Harness Code Repositories can now be used as a source during DB Schema configuration and execution workflows.
Tagging Behavior
Enhanced tagging for database changesets improves consistency and traceability during migration workflows. Release notes here.
Purchase Credits API reliability
Database operations in the Purchase Credits API are now atomic, with enhanced logging for overage details during credit resets.
Software Engineering Insights is now AI DLC Insights (Development Lifecycle Insights). Cloud Cost Management is now Cloud and AI Cost Management. Both capabilities reflect an expanded scope for the existing products: AI is now a first-class dimension in both products, not a filter you apply after the fact. Read the announcement

Cost Explorer with AI/ML Workload Visibility
Cloud and AI Cost Management's Cost Explorer now surfaces AI/ML spending alongside traditional cloud costs in a unified view. As teams add GPU instances, inference endpoints, and model API spend, that usage now appears in the same dashboards as the rest of the cloud bill. Docs
Data Job Status
Real-time visibility into the cloud cost data pipeline. When billing data from AWS, Azure, or GCP is delayed, failed, or stale, the Data Job Status page now shows the actual state. Previously, stale billing data produced incorrect recommendations and anomaly alerts with no indication that the underlying data had a problem. Docs

Cost Settings for Recommendations
A rebuilt, tabbed configuration experience for AWS and Azure recommendation cost preferences. AWS supports Passthrough Cost for both uniform and mixed account configurations, with per-account cost-type visibility. Azure adds selectable options for Amortized and List Price views of recommendation costs. Release notes
AI Summaries and Insights Dashboard Enhancements
AI DLC Insights dashboards now surface AI-generated summaries alongside DORA metrics, productivity data, and workflow visualizations. The goal is to reduce the gap between "here is the chart" and "here is what to do about it." Docs
PR Cycle Time Excludes Bot-Generated Review Comments
The Productivity Insights dashboard now strips bot-generated review comments from PR Cycle Time calculations. Cycle time now reflects human reviewer activity only, which is the number that matters for understanding team throughput. Release notes
Custom Date Range on Dashboards
All dashboards on the Insights page now support a custom date range beyond the default presets. Analyze metrics over any time window, useful for quarterly reviews, incident post-periods, and year-over-year comparisons. Docs
Enable or Disable Developer Filtering for Lead Time for Changes
Control whether Lead Time for Changes honors developer filters at the team level from Team Settings. Gives engineering teams more precision in how DORA metrics are calculated and attributed across distributed or shared-team structures. Docs
ServiceNow Integration
ServiceNow is now a data source for engineering insights. Ingest, normalize, and analyze ITSM data directly within dashboards. DORA metrics can be calculated from ServiceNow incident and change management records for teams where ServiceNow is the system of record. Docs
qTest Integration
Test management data from qTest Cloud now flows into AI DLC Insights via API key authentication. Docs
FME Policy as Code: Environments and Segments
The OPA-based policy framework for Feature Management now covers environments, segments, and segment definitions. Teams can enforce consistent governance standards across the full FME configuration surface, not just flag-level rules. Release notes
Catalog Roundup: Modeling, Connections, and Surface Area
A set of enhancements expands what the developer portal catalog can model, connect, and display. The changes are incremental, but together they close gaps that platform teams have been routing around.

Integrations Overview on Entity Pages
The entity details page now includes a dedicated card showing key integration data directly on the overview. Platform engineers and developers can see the health and status of an entity's connected integrations at a glance rather than navigating to a separate integrations view. Docs
GitHub Integration: Secondary Entity Kinds
When configuring GitHub integration, you can now select secondary entity kinds to map discovered repository entities to. The data from those kinds surfaces directly on the entity details page, giving platform teams more flexibility in how GitHub content is represented in the catalog. Docs
AI Asset Instructions Tab
Entity pages for AI Assets now include a dedicated Instructions tab that renders the associated documentation file from GitHub directly within the portal. Teams discover and read AI asset documentation without leaving the catalog. Docs
Blueprints at Organization and Project Levels
Environment Blueprints can now be created and managed at the Organization and Project scope levels, in addition to the Account level. The blueprint listing page shows the scope for each blueprint, and managed roles have been updated with the appropriate permissions at each scope.
Kubernetes Load Testing
Load tests can now run against Kubernetes infrastructure. Previously load tests required Linux infra, meaning chaos testing and load testing needed different tooling and separate infrastructure even when targeting the same cluster. Resilience testing is now fully Kubernetes-aware end to end. Docs
Chaos Enhancements
A set of improvements landed across the chaos platform this month: filtering support for chaos experiment lists in the REST API, step name editing in Chaos Studio, NOT_EQUAL_TO operator for ChaosGuard namespace label selectors, tag-based filters on the DR Tests screen, probe chain logic, DR Test ACL permissions and audit events, user-based filters in the Experiments API, support for output variables in chaos resources, and the Chaos NG experience reaching general availability. Release notes
Playwright Execution Service (Beta)
Harness AI Test Automation now runs native Playwright test suites directly on the platform. Your playwright.config, spec files, and package.json scripts work as-is: connect your repo, point to your project root, and run. No grids to configure, no browser images to maintain, no infrastructure to scale. Tests run in cloud with parallel workers out of the box.
When tests fail, Harness automatically classifies the failure as regression, flaky, performance, or environment issue, so engineers spend time fixing problems instead of determining whether a problem is real. Playwright runs are first-class pipeline steps: results live in the Tests tab alongside build and deploy stages, and tests block deployments by default. Existing Playwright investments stay intact; scripts can evolve into AI-generated intent-based tests gradually when teams are ready.
Available now in beta. Release notes | Docs | Blog
CEL Expression Engine
Common Expression Language is now the full expression engine for AI SRE runbook conditions. Write dynamic conditions using regex matching, datetime formatting, list comprehensions, and math anywhere logic is evaluated or data is transformed. Docs
Google Chat Integration
Teams using Google Workspace can now run incident response from Google Chat: dedicated incident spaces, bidirectional message mirroring between the AI SRE UI and Google Chat, automatic responder adds, and real-time incident timeline sync. Built on Pub/Sub for reliable message delivery. One-time admin setup per organization. Docs
Service Account Token Notifications
Configure alerts for service account token events: creation, rotation, updates, expiration, deletion, and upcoming expiration. Delivered across notification channels already configured in your account. Expiring service account tokens are a common cause of silent pipeline failures; this makes them visible before they cause an outage. Docs
Platform Alerts
An in-app notification framework now surfaces important account-level events automatically within the Harness UI: approaching resource limits, system release announcements, and other account-wide signals. No external configuration required. Docs
The teams compounding fastest on AI are the ones where the whole system accelerated, not just the part that writes code. May brought 60+ feature releases, a Skills Library that makes any AI coding assistant fluent in Harness, artifact registries that know what they are serving and to whom, and the first dashboards that connect AI spend to AI output. The bottleneck keeps moving. We help you unblock the bottleneck in your software delivery.
See you in June.


Key Takeaway: The Harness MCP Server is now in the official Claude Connectors Directory. Developers using Claude can now discover and connect to Harness, gaining structured, real-time access to their pipelines, deployments, approvals, and delivery workflows. What makes this different from a typical API integration is what's underneath: the Harness Software Delivery Knowledge Graph, which gives Claude the context it needs to make decisions that are accurate, fast, and safe.
AI agents are only as good as the context they operate in. That's not a design philosophy. It's a practical constraint. An AI agent that doesn't understand how the underlying software delivery entities relate to each other, or what the data actually means, will get things wrong. In software delivery, wrong looks like a botched deployment, a misread failure, or an approval granted when it shouldn't have been, which directly affects your users.
Today, we're announcing that the Harness MCP Server is in the official Claude Connectors Directory, making Harness discoverable and connectable for every team using Claude. But the announcement isn't really about the directory listing. It's about what Harness + Claude can actually do in your delivery system.

Claude can work across the full Harness delivery platform:

All of it is grounded in the Knowledge Graph, not raw API responses, but a structured model of your delivery system that Claude can reason over precisely.
MCP lets AI models call external tools by reading API descriptions and deciding which to invoke. That flexibility is useful. But when you're building an agent that needs to reason across an entire software delivery lifecycle, CI, CD, security scans, approvals, feature flags, cost signals, and environments, raw API access creates a deep reliability problem.
Consider a question a platform engineering lead might ask:
"Show me the pipelines with the highest failure rate over the last 30 days, and for each one, tell me which services they deploy and whether any of those services have open critical vulnerabilities."
That question spans four domains: pipeline execution history, service-to-pipeline relationships, environment state, and security scan results. An agent working off raw APIs has to discover which APIs exist across each domain, call them in the right order, paginate correctly, infer how field names correspond across systems, and synthesize the results without misinterpreting nested objects or guessing at relationships.
The result is 5+ sequential LLM calls, hundreds of thousands of input tokens, high latency, and an agent that had to guess at every join. Guessing is where hallucinations happen.
The Harness Software Delivery Knowledge Graph is a purpose-built model of everything that happens after code is written: builds, test runs, deployments, approvals, security scans, environment states, feature flags, infrastructure changes, cost signals, and rollbacks. Not as raw data but as a connected, typed, semantically annotated graph of entities and relationships.
Every field in the graph carries metadata that tells an agent exactly how to use it: whether a value is a number or a string, whether it can be aggregated or only filtered, what its unit is, and how it joins to related entities. Cross-module relationships, between a pipeline and the services it deploys, between a deployment and the security scan results for that artifact, between an environment change and the cost anomaly that followed, are explicitly declared, not inferred.
This is the difference between an agent that can access your delivery system and one that understands it.
When Claude connects to Harness via MCP, it doesn't receive a set of API endpoints. It's getting access to a structured model of your entire delivery organization, one where the relationships are known, the data types are enforced, and the agent can construct precise queries rather than guessing at field semantics.
The practical effect with Harness + Claude: that same cross-domain question above becomes 2–3 structured queries against a known schema. The agent selects the right entity types from the graph, generates queries with exact fields and declared relationships, and returns a deterministic answer. No guesswork. No hallucinated field names. No silent wrong answers.
A build has failed. Normally, you'd open the Harness UI, navigate to the execution, copy the relevant logs, paste them into a conversation, and wait for analysis. The AI reasons over whatever you managed to capture.
With the Harness MCP connection active in Claude, you ask what failed. Claude doesn't just pull logs; it queries the Knowledge Graph to understand the structure of that pipeline, which stage failed, what services were involved, whether similar failures have occurred before, and what changed since the last successful run. The answer it surfaces reflects the full delivery context, not just the stack trace you happened to copy.

Your team is ready to move a service from staging to production. Claude checks the current environment state, verifies that required approval gates have been satisfied, confirms the security scan passed for the artifact version you're promoting, and initiates the deployment — with every action running through your existing RBAC policies and logged for audit.
The agent isn't guessing about whether conditions are met. It's querying a graph where those conditions are modeled as typed relationships with known states. The answer is deterministic because the data is structured to make it so.
The natural question when Claude can trigger pipelines and manage deployments: what stops it from doing something it shouldn't?
The same controls that govern everything else in Harness. Every action taken through the MCP server runs through your existing RBAC permissions, OPA policy enforcement, approval gates, and audit logging. Claude operates with exactly the permissions you have, nothing more. Every action is tracked. Nothing bypasses the governance layer.
The Knowledge Graph reinforces this: because Harness AI understands your delivery system structurally, it also understands the constraints within it. Approval gates aren't just optional steps the agent might skip; they're modeled as typed relationships with state. The agent can't promote past a gate that hasn't cleared because the graph reflects that clearly.
Speed and governance aren't a tradeoff. They coexist by design.
The Claude Connectors Directory is a curated, reviewed set of integrations. Anthropic evaluates each server before listing it. Being approved is a signal of trust that carries weight for enterprise teams deciding which AI integrations to enable.
It also means discoverability at scale: engineering teams using Claude for DevOps workflows will find Harness natively. One-click OAuth connection, no API key management, no manual configuration.
This fits a broader pattern. The Google Cloud partnership brought Harness into Google's AI ecosystem through Vertex AI and Gemini CLI. The Cursor plugin brought it into the IDE. The Claude Connectors Directory brings it into conversational AI. In each case, the goal is the same: wherever developers are doing their best thinking and wherever AI is being asked to help with software delivery, Harness should be present with the right context for that AI to act reliably.
If you're already a Harness customer:
If you're new to Harness, sign up for free and connect from day one. Detailed steps are listed in the documentation.
The Harness Connector gives Claude the ability to act in your delivery system. The Knowledge Graph gives it the understanding to act well. Together, that's what reliable AI in software delivery actually looks like.


Modern data platforms are evolving rapidly, and Google Cloud BigQuery has become a core part of analytics, AI, and large-scale reporting architectures. Teams (including Harness) rely on BigQuery to process and analyze massive datasets, but managing schema changes in a secure, repeatable way can still be challenging.
Today, we’re excited to announce BigQuery support for Harness Database DevOps, enabling teams to bring the same automation, governance, and reliability they expect from application DevOps to their BigQuery deployments.
With this release, organizations can now manage BigQuery schema changes using pipeline-driven Database DevOps workflows directly within Harness, while also leveraging secure OIDC-based authentication for keyless access.
BigQuery helps organizations move fast with data, but database change management often remains manual and fragmented.
Common challenges include:
Without a standardized deployment process, teams struggle to balance speed, reliability, and security.
Harness Database DevOps now supports BigQuery as a first-class database platform, allowing teams to manage schema changes through automated, pipeline-driven workflows.
This means BigQuery schema changes can now be treated just like application code versioned, tested, approved, and promoted through environments using Harness pipelines.
With BigQuery support, teams can:
The result is a modern Database DevOps workflow for BigQuery that helps teams release faster without sacrificing security or reliability.
Harness Database DevOps can now connect directly to BigQuery environments using BigQuery JDBC connector powered by the Simba BigQuery JDBC driver.
Example JDBC URL:
jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=YOUR_PROJECT_ID;DefaultDataset=YOUR_DATASET;Location=YOUR_REGION;
OAuth access tokens are injected automatically during authentication, removing the need for manual credential management.
Harness supports OIDC authentication using GCP Workload Identity Federation, allowing teams to securely authenticate to BigQuery without storing long-lived service account keys.
During pipeline execution:
This improves:
No static JSON keys are stored in Harness or delegate environments.
Use Harness pipelines to automate BigQuery schema deployments with repeatable workflows across environments.
Teams can:
Leverage Harness approval gates, RBAC, and policy enforcement to ensure safe production changes. This helps organizations introduce governance into analytics database deployments without slowing down delivery velocity.
Track every BigQuery deployment with:
This creates a more transparent and auditable deployment process for data teams.
As organizations increasingly rely on BigQuery to power analytics and AI workloads, database changes require the same level of automation and governance as application deployments.
By bringing BigQuery into Harness Database DevOps, teams can:
BigQuery support for Harness Database DevOps is now available.
To get started:
Learn More on setting up our documentation.

To learn more about using BigQuery with Harness Database DevOps, check out our documentation or schedule a demo.
Additional Resource - Warehouse Native BigQuery Integration


Releasing new software used to be a big deal. You would set aside a Saturday night, wake up the on-call engineer, push the code, and hope that nothing broke before Monday morning.
Then came feature flags, which changed everything without anyone noticing.
Feature flags let you separate deployment from release, so you can send code to production in a dormant state and turn it on for users when you're ready. No more 1 a.m. maintenance windows. We don't have to ship every feature in a release together anymore, or scramble to pull one back with a hotfix. Just code in production, off by default, and ready when you say so.
But the tools have improved a lot. Feature flag tools these days are more than just on/off switches. The best ones have flag management, progressive delivery, real-time release monitoring, A/B testing, and AI-driven guardrail metrics all built right into your CI/CD pipeline. That changes how a release looks, how a rollback feels, and how confident your team is when they ship.
Here's a look at the best feature flag tools available, along with what each one does well and what to look for when picking the right one for your team.
A feature flag, or feature toggle, is a conditional block in your code that controls whether a new feature is active for a given user. Wrap a flag around a checkout page redesign, and you can push the code to production while keeping the new flow hidden from 99% of users. Set it to 1% as a canary, monitor your metrics, and gradually increase the rollout percentage if everything looks good.
Feature flag tools handle the whole lifecycle: creating flags, targeting users, rolling them out incrementally, monitoring their impact, and retiring flags once they've served their purpose.
Modern platforms add a few more layers on top of that:
The toggle itself isn't worth much. The safety net around it is.
Before you start looking at different tools, make sure you know what your team really needs. Some questions you should ask are:
Does it work with the CI/CD pipeline you already have? Your developers will work around a flag platform that is outside of your delivery workflow, not with it.
Can it connect flag exposure to your observability stack? You don't want three dashboards to cross-reference when something breaks at 3 a.m. You want one screen that tells you which feature caused the spike.
Will it scale with your traffic and your team? When you have millions of users, SDK performance, evaluation latency, and offline fallback are all important.
Does it cover governance for regulated environments? In healthcare, fintech, or anything touching PII, RBAC, approval workflows, immutable audit trails, and Policy as Code aren't optional.
How does it handle flag lifecycle management? Stale flags are technical debt. The best platforms include ownership assignment, sunset policies, and dashboards that surface flag age and usage frequency.
With those criteria in mind, here are the best tools to consider.
Harness FME is a developer-first platform that brings feature management, A/B testing, and release monitoring into one unified system. Built on the combined Split and Harness lineage, FME is designed for enterprise teams that want experimentation baked into their CI/CD pipeline not bolted on as a separate workflow.
What makes FME stand out:
Best for: Enterprise engineering teams that want a single platform for feature flags, experimentation, and release monitoring, with deep CI/CD integration.
LaunchDarkly is one of the oldest feature flag platforms on the market. It's a popular choice for teams that want a flag-first product with mature SDK support for most major languages.
Some of its strengths are that it has a lot of SDK support, good targeting options, and a long history of managing features. Some teams may prefer other vendors for bundled analytics or warehouse-native analysis. Teams that do a lot of A/B testing often use LaunchDarkly with a separate analytics or stats engine, which makes things more complicated.
Best for: Teams whose primary need is feature flag management, with separate tooling for testing and observability.
Statsig has become a popular platform for product-led growth teams. Statsig is a popular platform for product-led growth teams because it has a free tier that includes feature flags, experimentation, and product analytics all in one place.
The platform's statistical engine is good. It can do sequential testing and has a good way of testing for significance. With warehouse-native mode, you can analyze your own data infrastructure. Statsig is still growing in enterprise governance, but its RBAC and audit features aren't as strong as those found in regulated industries.
Best for: Product-led growth teams that want flags, experiments, and analytics in one system without heavy enterprise requirements.
Ownership note: Statsig announced in September 2025 that it would join OpenAI. OpenAI said Statsig would continue operating independently and serving current customers, so buyers may want to watch how the roadmap evolves under new ownership.
Optimizely's roots are in web-based A/B testing, and it brings that history of experimentation into its feature flag product. The platform's statistical methods are well-established, and marketing teams that have used other Optimizely products are likely to choose it.
The downside is that you can see where Optimizely came from in some places. The product is more useful for web and front-end use cases and less useful for the kind of deep backend, infrastructure-level flag management that engineering teams often need. More developer-native tools tend to work better for product engineering teams that only work on products.
Best for: Marketing-engineering hybrid teams already invested in the Optimizely ecosystem who want to extend it to product feature testing.
PostHog is an open-source platform that bundles product analytics, feature flags, experimentation, and session replay together. It's a popular pick for early-stage companies that want a lot of capability without paying for multiple platforms.
The all-in-one approach works well at a smaller scale. As you grow, you may find that specialized tools go deeper on individual capabilities particularly enterprise-level flag management and statistical rigor. The self-hosted option is a meaningful advantage for teams with strict data residency requirements.
Best for: Startups and growth teams that want product analytics and feature flags in one place, with a self-hosting option.
Flagsmith is a feature flag platform that is completely open source and can be hosted in the cloud or on your own server. It's a good choice for teams that need open-source flexibility (or strict self-hosting) but don't want to lose the polished product experience.
The platform does a good job of covering the basics, like targeting, segmentation, multivariate flags, and SDK support for most languages. It's not as heavy as enterprise platforms when it comes to advanced experimentation, AI-driven release monitoring, and deeply automated guardrails.
Best for: Teams with privacy requirements, self-hosting mandates, or a strong preference for open-source software.
Unleash is another open-source option with a strong following in Kubernetes-native shops. It's known for being straightforward to set up, easy to understand, and well-suited to teams that want full control over their tooling.
Like Flagsmith, Unleash handles flag management well but doesn't extend as far into experimentation or release intelligence. If your team primarily needs to safely gate features and host the platform yourself, Unleash is a solid choice.
Best for: Open-source-first teams, especially those running Kubernetes infrastructure.
ConfigCat markets itself as a simple, inexpensive feature flag service with clear prices and an easy setup. A lot of small to medium-sized teams choose it because they want to manage flags without the extra work that comes with a bigger platform.
The product includes the basics, such as targeting, segmentation, percentage rollouts, and connections to popular tools. It wasn't made to be a testing platform, so teams that need statistical analysis will have to use it with something else.
Best for: Small-to-midsize teams that want light-weight, budget-friendly flag management without enterprise complexity.
GrowthBook is an open-source feature flag platform originally built around warehouse-native experimentation. The premise: your experiment data is already in BigQuery, Snowflake, or Redshift, so it should be analyzed there rather than piped to a separate vendor.
For data teams that have invested heavily in their warehouse, GrowthBook is a strong fit. The statistical methods are rigorous. Bayesian and frequentist options, sequential testing, CUPED variance reduction, and the open-source model gives you full control over the platform.
Best for: Data teams that want serious warehouse-native experimentation with open-source control.
AWS AppConfig is Amazon's native configuration and feature flag service for teams operating entirely within the AWS ecosystem. It integrates cleanly with Lambda, ECS, EKS, and EC2, and runs as a fully managed service under your existing AWS account.
The trade-off is depth. AppConfig treats flags as part of broader application configuration. It isn't a purpose-built platform for experimentation or release intelligence. Teams that need advanced targeting, A/B testing, and release monitoring at the level of a dedicated tool will outgrow it quickly.
Best for: AWS-native teams with modest flag requirements who want to stay within the AWS ecosystem.
Once you've narrowed down your list, here are a few things to think about.
Feature flag tools started as a clever way to ship code that wasn't quite ready without breaking production. They've grown into something much larger: the foundation for safer releases, faster experimentation, and a development culture where shipping doesn't feel like gambling.
The best platforms bring feature flags, progressive delivery, real-time monitoring, and AI-driven guardrails together in one place integrated with your CI/CD pipeline so every release becomes a controlled experiment rather than a leap of faith.
Harness Feature Management & Experimentation brings flags, experimentation, and release monitoring into a single enterprise-grade platform, with AI-driven guardrails and deep CI/CD integration built in. Every deployment becomes a measurable, recoverable experiment instead of a gamble.
They mean the same thing. "Feature flag" and "feature toggle" are used interchangeably across the industry. Some teams use "toggle" for simple on/off switches and "flag" for more complex multivariate or targeted releases, but most platforms and engineers treat them as the same concept.
Flagsmith, Unleash, and GrowthBook are all capable of running in production at scale. The trade-off is usually in advanced experimentation, AI-driven release monitoring, and enterprise governance. If those aren't requirements, open source is a legitimate path. For teams where they are requirements, a managed enterprise platform typically saves more in engineering time than it costs.
Yes. Many early-stage products start with homegrown approaches using config files or environment variables. The cracks show later: targeting becomes hard to manage, there are no audit trails, and stale flags accumulate as silent technical debt. Most teams hit a threshold (usually around 20 to 30 active flags) where a dedicated platform pays for itself in saved engineering time.
The best platforms integrate directly with your CI/CD pipeline so flag updates can flow through GitOps workflows, CLI commands, or pipeline steps. That keeps flag changes in the same review and audit flow as code deployments. During an incident, you have one place to look: what changed, when, and who changed it.
You can run them separately, but you'll spend ongoing effort keeping data consistent across two systems. Unified platforms like Harness FME use the same flag, SDK, and exposure pipeline for both flag management and experimentation which eliminates an operational pain point that most teams don't appreciate until they've lived with the split-system version.
Three habits cover most of it:


When Anthropic broke the news of Mythos and Project Glasswing, the security community did what it always does. It published a flurry of papers asking "What does this mean for security?" It's a reasonable instinct, but it's the wrong question.
The real question is who actually owns the problem?
Even Anthropic's own guidance on preparing your security team for the AI era, comprehensive and well-reasoned as it is, lands squarely on steps that security teams can influence but cannot execute. Maintaining accurate inventories of exposed systems, decommissioning legacy services, and minimizing API exposure. These are all the right steps. They are also, unambiguously, engineering steps.
Security teams have owned these conversations for years, not because they were ever truly equipped to act on them, but because engineering was remarkably effective at passing the responsibility to someone else. That era is over.
Take attack surface reduction as a concrete example. Anthropic's recommendations are sound: know what you're exposing, shut down what you don't need, lock down your APIs. But a security team cannot decommission a legacy service. They cannot refactor an API. They can nag, escalate, and document, then watch the ticket sit in a backlog for six months.
Engineering has to take this on. Not reluctantly, not after repeated escalations, but as a core ownership responsibility. The framing of "security's job" versus "engineering's job" is a liability the industry can no longer afford.
This transition won't be easy. Changing ownership models inside organizations is political, slow, and often painful. But the alternative means maintaining siloed teams while AI-accelerated vulnerability exploitation scales faster than any manual process can respond. That isn't a strategy. It's a countdown.
Here's what needs to happen immediately:
This isn't a theoretical future risk. The wave is already forming offshore, and most organizations are still debating whether to build a seawall.
AI hasn't just made attackers faster, it has fundamentally changed the economics of exploitation. What once required a skilled threat actor, weeks of reconnaissance, and significant resources can now be automated, scaled, and deployed by someone with a capable model and a motivated prompt. Zero day vulnerabilities that previously had a window of days or weeks before widespread exploitation are now being weaponized in hours. The asymmetry between attack and defense has never been more extreme.
Here's the uncomfortable truth: the traditional security model was never built for this speed. It was built for a world where humans attacked and humans defended, where there was time to deliberate, escalate, and patch. That world is gone.
Mythos doesn't wait for your quarterly security review. GlassWing doesn't care that your legacy service decommission is "on the roadmap for H2." AI-powered exploit tooling operates at machine speed. And right now, the defense side of that equation is still running on organizational clock time.
Organizations that recognize this moment and act on it will look very different in three years. Security and engineering will share OKRs, not just Slack channels. Remediation won't be a ticket handed off between teams, it will be a joint sprint. Attack surface reduction will be an engineering hygiene standard, not a security audit finding.
Organizations that don't adapt will face a different outcome. It won’t be a gradual decline, but a sudden, forced reorganization triggered by a breach that exposes exactly how brittle the old model was. The silo walls won't come down in a planned migration. They'll come down in an incident post-mortem.
Industry inflection points rarely announce themselves clearly, but this one is. The research is public and the threat models are documented. Anthropic, and others, have laid out precisely what needs to happen. The gap between knowing and doing is entirely organizational — and that gap is where the real risk lives.
The teams that start the hard conversations now about ownership, accountability, and shared responsibility are the ones that will be positioned to respond when the wave hits. And it will hit. The question isn't whether your organization needs to change. The question is whether you'll choose the terms.


Gartner expects worldwide AI software spending to hit $2.59 trillion in 2026, 47% more than organizations spent last year. The dollars are real and growing fast. But most organizations still can't measure the ROI of that spend.
The problem has two sides: developers and infrastructure. On the developer side, engineers are using AI to write nearly every line of new code, and leaders have no way to tell whether that spend is producing software that ships. On the infrastructure side, agents in production consume tokens with every customer interaction, every resolved ticket, every automated workflow, and the invoice is the only signal on whether any of it is worth what it costs.
Organizations can tell you what they spend on AI. Very few can tell you what they got for it. According to our 2026 State of Engineering Excellence report, 94% of engineering leaders say the metrics that matter most are missing from their current measurement frameworks.
Today, Harness is launching two products to close both gaps.
AI DLC Insights builds on Harness Software Engineering Insights and ties every AI-generated line of code to the PR, ticket, and deployment it produced, so engineering leaders can see where token spend is turning into shipped work and where it isn't.
Cloud & AI Cost Management extends Harness Cloud Cost Management with unit economics, anomaly detection, and budget governance for every dollar of AI infrastructure spend, so the question "is this agent worth what it costs?" finally has a number behind it.
"AI spend isn't the conversation anymore — ROI is. Every dollar we put into AI, from tokens consumed to customers served, has to earn its keep. That's what my executives are asking about today."
Josefa Roche, Sr. Cloud FinOps Engineer, Revionics, an Aptos Company
Every developer writing software today is coding with AI. Copilot, Cursor, Claude, Gemini: the tools vary but the pattern is universal. Adoption is not the problem.
The problem is that token spend has never been connected to efficiency or outcomes. Developers generate code with AI coding agents, a fraction of it ships, prompts are longer than necessary, and generated code gets rejected in review. Engineering leaders have no visibility into any of it — not the ship rate, not the wasted tokens, not the rejected code.
Harness CEO Jyoti Bansal recently described this behavior as tokenmaxxing: an engineer burns 500K tokens generating code that gets rejected in review. By the leaderboard, they beat the engineer who shipped a clean 50-line patch. Tokenmaxxing made sense as a forcing function when adoption was the goal. That phase has an expiration date.
AI DLC Insights includes a new on-machine developer agent that runs directly in the developer's environment. It observes the IDE and terminal in real time, captures every AI-generated line of code, records the token cost per model and tool, and maps that spend through the delivery chain to the PR, the ticket, and the deployment that shipped.
An engineering leader can now say "it cost us $5,200 in AI credits to fix that bug" and mean it. Here’s what’s in the release:

Fig. 1: AI DLC Insights gives engineering leaders a unified view of AI adoption, spend efficiency, and delivery impact across coding agents, teams, and workflows.
Once an AI agent ships to production, a different cost equation takes over. Every customer interaction, every resolved ticket, every automated workflow triggers inference. The spend is continuous, scales with usage, and in most organizations is visible only at the invoice level. That tells you which line item is growing, but tells you nothing about whether the spend growth is worth it.
A $28,000 monthly spend on a customer support agent is a completely different number depending on how many tickets it resolved. If it cost $0.60 per resolved ticket and the human alternative costs more, it is one of the best investments in your stack. If the math runs the other way, you are paying more for automation than the process it replaced. Most organizations cannot tell the difference today.
Cloud & AI Cost Management closes that gap. Harness connects directly to your AI providers and production agents, capturing spend at the level of each individual request and tying it to the agent, session, or workflow that triggered it. The same cost categories, budgets, and anomaly detection already running on your cloud spend now apply to every AI token your infrastructure consumes.
A finance leader can finally answer the question the business is asking: is this agent worth what it costs? Here’s what’s in the release:

Fig. 2: AI Cost Unit Economics dashboard connects total AI spend to the metrics that matter, giving leaders a cross-provider breakdown of cost per token, per inference, and per session across providers.

Fig. 3: AI spend, attributed by agent. At a glance: which agents are growing, which sessions are getting more expensive, and what AI cost looks like as a share of revenue.

Fig. 4: Run-level waterfall for a single agent run. The cost and latency of every step, every model call, and every tool invocation, with span attributes for debugging.
AI DLC Insights answers the developer question: is token spend turning into shipped work? Cloud & AI Cost Management answers the infrastructure question: is each agent worth what it costs in production? Both questions now have a direct answer in the same platform.
The first phase of enterprise AI was adoption. The next is about proving the tools are worth their cost. The organizations that can show where the money goes and what it produces will spend the next dollar with confidence. The rest will keep approving line items they can't explain.
AI DLC Insights and Cloud & AI Cost Management are available in beta now. [Learn more]


AI coding tools made code generation faster. Measuring what actually ships is the hard part.
Over the last eighteen months, tools like Cursor, Claude Code, Copilot, and Windsurf have fundamentally changed how software gets built. AI-generated pull requests are increasing, developers are producing more code than ever before, and workflows that once took hours now happen in minutes. But most organizations struggle to clearly explain what that investment is actually producing.
Only a fraction of AI-generated code ultimately survives review and reaches production, yet engineering leaders still lack visibility into which coding agents improve delivery performance and which workflows simply contribute to tokenmaxxing with no clear ROI.
That gap exists because traditional engineering systems were built for a world where development started with a commit. But AI fundamentally changed where the software development lifecycle begins. Development no longer starts with a commit. It starts with a prompt. The model choice, token consumption, generated code, review cycles, deployments, and production outcomes are now all part of the same engineering workflow. Measuring only what happens after code is committed is no longer enough.
That shift is what led Harness to evolve Software Engineering Insights into AI DLC Insights, to help organizations measure how AI-generated work moves through the entire development lifecycle from prompt to production.
These three operational gaps exist inside almost every team running AI at scale today:
These three gaps are exactly what AI DLC Insights is organized around. Together, they give engineering leaders a complete picture of what AI is producing inside their engineering organization, from the first prompt to the last deployment.
The first question starts with understanding what AI adoption actually looks like at the team and individual level. Seat counts and API usage aggregates give you a surface view. Understanding whether AI-generated code is actually making it into production requires something deeper.
Most engineering systems were never designed to observe AI-assisted development workflows directly. Source control can show what was committed. Billing systems can show token consumption. Neither can explain which generated code actually survived review, reached production, or improved delivery performance.
That is why AI DLC Insights introduces a new Agent that runs directly inside the developer environment. The agent observes AI interactions in real time, captures AI-generated code, tracks token consumption across coding agents and models, and connects that activity directly to commits, pull requests, deployments, and production outcomes.

What that makes visible:
Developer token consumption is increasing every month, but most teams still cannot explain which workflows are producing production-ready code and which are simply burning tokens.
That gap exists because token spend and engineering outcomes typically live in completely separate systems. Finance teams can see the monthly invoice, while engineering teams can see sprint activity and pull requests. Connecting token consumption directly to shipped code, deployment velocity, and engineering throughput is still difficult for most organizations.
As tokenmaxxing behaviors emerge, activity can easily be mistaken for impact. Some workflows generate meaningful production-ready code and improve delivery throughput, while others consume enormous amounts of tokens without improving what actually ships.

AI DLC Insights closes that gap, breaking down spend by developer, team, agent, and workflow:
Adoption and efficiency are inputs. Impact is the output. And the output is not lines of code generated or tokens consumed. Its features shipped, bugs resolved, lead time reduced, security posture improved, and customers getting better software faster.
More AI-generated code does not automatically produce those outcomes. Without the right visibility, AI adoption can quietly produce the opposite: more code volume with more review burden, more complexity with more regressions, faster generation with slower delivery cycles. The organizations that catch those patterns early are the ones that maintain quality as velocity increases.

AI DLC Insights connects AI activity to the delivery metrics that reveal what is happening downstream:
The first generation of engineering analytics platforms measured software delivery after the commit. The next generation will measure how humans and AI systems build software together.
Boards are no longer asking whether engineering teams are using AI coding tools. They’re asking whether the investment is improving software delivery in measurable ways. Whether teams are shipping more production-ready code. Whether delivery metrics are moving alongside token consumption. Whether the spend is generating real engineering leverage or just increasing the invoice.
Answering those questions requires visibility into how AI-generated code actually behaves across the full development lifecycle, from the prompt that created it to the deployment that shipped it.
That is what AI DLC Insights was built to deliver.
Ready to prove the ROI of your AI engineering investment? Request a demo to learn more.


The pace of AI spend has gotten ahead of most teams. New agents, new copilots, new flows powered by language models, all moving from prototype to production in weeks. Finance is often the first to flag it, because the data lives across provider invoices, gateway dashboards, observability tools, and cloud bills with no single source of truth. A small change to a prompt or a model can move spend by an order of magnitude. A retry loop in an agent can burn a month of budget in an afternoon.
Across customer and analyst conversations, the same questions keep surfacing. What are we spending on AI today, across providers and teams? How do we attribute that spend to the products, features, and customers driving it? At the unit level, not the invoice level, is each AI feature actually economical?
Today we're launching Harness Cloud & AI Cost Management, a new product that puts AI spend and cloud spend into the same system, with the same allocation, governance, and anomaly detection.
Cloud Cost Management has been part of Harness for years. Its primitives (cost categories, perspectives, budgets, anomaly detection) work because they meet teams where they already manage cloud. Cloud & AI Cost Management applies those same primitives to AI workloads and adds the granularity AI requires: session, agent, run, step, and individual LLM call. The deeper shift is unit economics. Every dollar of AI spend gets tied to the agent, session, and outcome it produced, so AI features can be evaluated by what they cost per outcome rather than what they show on the monthly invoice.
Unit economics surfaced natively for measuring AI outcomes:

Unified visibility across native LLM providers and managed AI services. OpenAI and Anthropic for direct API spend. AWS Bedrock and GCP Vertex AI for managed services. Spend is normalized across providers, so comparisons and analysis don't require custom pipelines.
Per-model and per-version cost tracking, with input and output token volumes, inference counts, and trends. Useful for evaluating model choice, watching the impact of a model upgrade, and identifying which models are growing fastest in spend.
Cost attributed to AI agents, whether internal copilots, customer-facing assistants, or background automations. Inferences, session cost, token usage, and trends surfaced per agent so engineering and product teams can evaluate cost-per-outcome at the agent level.

Attribute AI spend to any customer-defined construct, including business unit, product line, customer tier, or feature. Built on the existing cost categories framework, so the rules teams have already written for cloud chargeback apply to AI spend with no extra setup.

Cost per session, cost per multi-turn interaction, and token composition broken down by call. This is the level of detail provider billing APIs can't give. A multi-turn conversation that costs four times an average session because the agent is looping through a tool chain becomes visible, attributable, and fixable.


Filter and group AI spend by the dimensions that matter for AI workloads:
Drill from business-level metrics down to raw cost data, with filters that compose the same way they do everywhere else in the product.

Most AI cost tools are point solutions. They show AI spend in their own dashboards with their own allocation model. That's useful for visibility, less useful when you're trying to govern AI alongside the cloud spend driving the rest of your infrastructure bill.
Existing Harness Cloud Cost Management customers get something different. The chargeback rules, cost categories, and budgets already written for cloud spend now apply to AI workloads. AI cost becomes another allocation in the same system, not a parallel workstream to reconcile separately.
The depth also goes further than provider billing APIs allow. AI spend can be analyzed at agent, session, run, and step level, down to the model and tool invoked at each step. Worst-case behavior surfaces as itself rather than averaged into a monthly number, and the same dimensions plug into cost categories, perspectives, and budgets.
Three ingestion paths let teams adopt the depth that matches their stage. Provider connectors give fast unified visibility across OpenAI, Anthropic, Bedrock, and Vertex. Gateway integration adds per-request attribution. OpenTelemetry traces give full session and workflow detail. Most teams start with connectors and add depth as their AI footprint grows.
A customer-support copilot might show $28,000 on a monthly invoice. That number alone doesn't tell you whether the bot is earning its keep. The more useful number is $0.60 per resolved ticket. And when a session costs $4 because the agent is looping through tools it shouldn't be using, that surfaces as a code problem you can fix, not a line item to explain after the fact.
Existing Cloud Cost Management customers can enable AI Cost Management today. For everyone else, request a demo.
.png)
.png)
Key Takeaway: Harness AI Test Automation now runs existing Playwright suites without code changes, adds AI-powered failure triage, and integrates test results directly into build and deployment pipelines.
Playwright has become the industry standard for end-to-end testing. Most engineering teams already have suites (sometimes hundreds of specs) running against their applications.
Writing the tests isn't the hard part anymore. Running them reliably, at CI speed, with meaningful feedback when things break: that's where teams still struggle.
The numbers tell the story:
Teams at Google, Dropbox, and Spotify have each built dedicated internal systems just to manage test flakiness and infrastructure. That's engineering investment that should go toward the product.
Harness AI Test Automation now lets you bring your existing Playwright projects and run them natively on the platform.
Your playwright.config, your spec files, your package.json scripts stay in your repo, exactly where they live today. Point Harness at your project root, and we run your suite using your config, extending it with reporters and trace settings that power AI triage and the Tests tab. No code changes required.
Why this matters:
Teams have invested months, often years, building and stabilizing their Playwright suites. A testing platform shouldn't ask you to throw that away and start over. Your stable tests stay exactly as they are. Tests that are flaky or hard to maintain can gradually evolve into AI-generated intent-based tests when you're ready, but there's no rewrite tax to get started.
Run in the cloud with parallel workers. No grid to configure, no nodes to scale, no browser images to maintain. Need to test an application behind a firewall? Secure tunnels handle private apps without exposing your network.
When a test fails, Harness automatically classifies it: regression, flaky, performance, or environment issue. You get the failure location, retry patterns, likely root cause, and a recommended fix. No more sifting through stack traces to figure out if the problem is real.
Engineers spend time fixing problems, not investigating whether the problem is real.
Some assertions are hard to express in code. "Does this page look correct?" "Is the checkout flow in a valid state?" "Does the error message make sense for this scenario?"
With the Harness SDK, you can add AI-powered assertions directly into your Playwright scripts. Hard-to-write assertions become simple natural-language questions. No complex selector logic, no brittle pixel comparisons. Your scripts stay in Playwright. The assertions just get smarter.
Playwright runs are native pipeline steps, not a service bolted onto your CI. If tests fail, the pipeline fails. Code is blocked from production. Every deployment is validated, every result is tied to a specific commit.
No context switching to an external dashboard. Results live in the pipeline's Tests tab, alongside your build and deploy stages.
When Playwright runs locally, one developer's test results are invisible to the rest of the team. Failures get investigated in isolation. Patterns go unnoticed. Knowledge stays siloed.
On Harness, every execution is visible to every developer. Teams can review each other's test runs, spot recurring failures together, and build a shared understanding of test health across the entire suite.
Test results are connected to the commit that triggered them and the deployment they validated. When something breaks in production, you can trace back through the exact test run, the exact code change, and the exact environment, all in one place.
Most external test execution services solve one problem well: running browsers at scale. But they leave you to stitch together the rest. CI integration, reporting, triage, and quality gating are your responsibility.
With native pipeline integration:
This isn't about choosing between scripted tests and AI. It's about using each where it's strongest.
Playwright delivers the reliable, repeatable execution your Harness CI/CD pipeline demands. Harness AI layers intelligence on top: triaging failures so you don't waste cycles investigating, generating assertions that would be painful to hand-code, and eventually creating new test cases from your requirements and code.
Bring your Playwright suite to Harness AI Test Automation. Connect your repo, point us at your project root, and run your first execution in minutes -- with AI failure triage included.
Interested to try this out. Please reach out to ait-interest@harness.io
Q1: Can I use my existing playwright.config without changes? Yes. Harness reads your existing playwright.config, spec files, and package.json scripts directly from your repo. No migration, no wrapper config, no reformatting. Point Harness at your project root and your suite runs as-is.
Q2: How does Harness handle flaky Playwright tests? When a test fails, Harness automatically classifies the failure — regression, flaky, performance, or environment issue — and surfaces the likely root cause alongside a recommended fix. Instead of sifting through raw logs, engineers see a verdict on whether the failure is real before they spend time investigating it.
Q3: Do I need to manage browser infrastructure or Docker images? No. Harness runs your Playwright suite in the cloud with parallel workers. Browser dependencies, Docker images, shard configuration, and compute scaling are all handled by the platform. For applications behind a firewall, secure tunnels support private app testing without exposing your network.
Q4: How is this different from BrowserStack or LambdaTest? External test grids solve browser execution at scale but leave CI integration, failure triage, and quality gating to you. With Harness, test results live natively in your pipeline, failures automatically block deployments, and AI triage is built in — no separate observability tool or custom webhook configuration required.
Q5: Can I add AI-powered assertions to my existing Playwright scripts? Yes, via the Harness SDK. You can add natural-language assertions directly into your existing Playwright scripts — things like "is the checkout flow in a valid state?" or "does this error message make sense for this scenario?" — without complex selector logic or brittle pixel comparisons. Your scripts stay in Playwright; the assertions just get smarter.


On May 16th, 2026, Inspired by the growing MongoDB and DevOps community in Bengaluru, we partnered with the Namma MUG community to bring together engineers exploring automation, CI/CD, Infrastructure as Code, and database migration strategies for modern applications.We had been looking forward to for a long time at Harness, our first Database DevOps community event in India focused on MongoDB and modern database automation practices.
The event was a deep dive for experts into how database automation can work with MongoDB easily, without needing manual steps.

My session on OSS Native Mongo Executor initiative was attended by several engineers already using tools like Liquibase, Flyway, and ORM driven migration workflows. That led to incredibly valuable conversations around what Database DevOps should look like for MongoDB-native environments.
Interestingly, many attendees wanted to understand:
We also had several deep discussions around CI/CD production rollout strategies and the differences between native Mongo execution and traditional relational migration engines.
These discussions were incredibly insightful because they showed that teams are no longer thinking only about “Database Scripts” - they are thinking about full database delivery workflows integrated into DevOps platforms.
One clear thing we heard throughout all our discussions was how much people want easier ways to get started and more hands-on examples for working with MongoDB DevOps. People kept asking us for simple guides for beginners, real examples of how to set up Continuous Integration and Continuous Delivery (CI/CD), starting templates, and clear steps for moving and rolling back databases from start to finish. We also got into some deep technical talks about handling complex queries, moving databases while they are live, and making sure our deployments are reliable, especially when we talk about advanced ways to undo changes.
A lot of the attendees were really curious about how our MongoDB-native ways of doing migrations are different from the older, traditional database methods. That led us into bigger discussions about why using native MongoDB tools is important, how we manage schema changes in NoSQL, and the unique problems we face with document databases as we move from simple open-source tools to big enterprise-level Database DevOps systems. Overall, the reaction to our new OSS Native Mongo Executor was fantastic! It was clear that people really liked our approach of building Database DevOps features that fit naturally with MongoDB, instead of trying to force old relational rules onto a NoSQL system.
The future of Database DevOps is expanding beyond relational systems, and it’s exciting to see the MongoDB community helping shape that journey with us. A huge thank you to everyone who joined us, especially the speakers and community members who made the event successful: Naveen Kumar, Narendra Gottipati.Pritesh Kiri, Aripriya Basu
For us at Harness, this meetup made us realise something important: The community is actively looking for better ways to automate MongoDB operations while maintaining reliability, governance, and developer velocity. We have a lot more events coming up which you can join - Harness · Events Calendar



Need more info? Contact Sales