
AI is changing both what you build and how you build it - at the same time. Today, Harness is announcing two new products to secure both: AI Security, a new product to discover, test, and protect AI running in your applications, and Secure AI Coding, a new capability of Harness SAST that secures the code your AI tools are writing. Together, they further extend Harness's DevSecOps platform into the age of AI, covering the full lifecycle from the first line of AI-generated code to the models running in production.
In November, Harness published our State of AI-Native Application Security report, a survey of hundreds of security and engineering leaders on how AI-native applications are changing your threat surface. The findings were stark: 61% of new applications are now AI-powered, yet most organizations lack the tools to discover what AI models and agents exist in their environments, test them for vulnerabilities unique to AI, or protect them at runtime. The attack surface has expanded dramatically — but the tools to defend it haven't kept up.
The picture is equally concerning on the development side. Our State of AI in Software Engineering report found that 63% of organizations are already using AI coding assistants - tools like Claude Code, Cursor, and Windsurf - to write code faster. But faster isn't safer. AI-generated code has the same vulnerabilities as human-written code, but now with larger and more frequent commits. AppSec programs that were already stretched thin are now breaking under the volume and velocity.
The result is a blind spot on both sides of the AI equation - what you're building, and what you're building with. Today, Harness is closing that gap.
Most security vendors are stuck in their lane. Shift-left tools catch vulnerabilities in code before they reach production. Runtime protection tools block attacks after applications are deployed. And the two rarely talk to each other.
Harness was built on a different premise: real DevSecOps means connecting every stage of the software delivery lifecycle, and closing the loop between what you find in production and what you fix in code.
That's what the Harness platform does today. Application Security Testing brings SAST and SCA directly into the development workflow, surfacing vulnerabilities where they're faster and cheaper to fix. SCS ensures the integrity of artifacts from build to deploy, while STO provides a unified view of security posture — along with policy and governance — across the entire organization.
As code ships to production, Web Application & API Protection monitors and defends applications and APIs in real time, detecting and blocking attacks as they happen. And critically, findings from runtime don't disappear into a security team's backlog — they flow back to developers to address root causes before the next release.
The result is a closed loop: find it in code, protect it in production, fix it fast. All on a single, unified platform.
Today, we're extending that loop into AI - on both sides. AI is reshaping what you build and how you build it simultaneously. A platform that can only address one side of that equation leaves you exposed on the other. Harness closes both gaps.
In the State of AI-Native Application Security, 66% of respondents said they are flying blind when it comes to securing AI-native apps. 72% call shadow AI a gaping chasm in their security posture. 63% believe AI-native applications are more vulnerable than traditional IT applications. They’re right to be concerned.
Harness AI Security is built on the foundation of our API security platform. Every LLM call, every MCP server, every AI agent communicating with an external service does so via APIs. Your AI attack surface isn't separate from your API attack surface; it's an expansion of it. AI threats introduce new vectors like prompt injection, model manipulation, and data poisoning on top of the API vulnerabilities your teams already contend with. There is no AI security without API security.
.png)
With the launch of AI Security, we are introducing AI Discovery in General Availability (GA). AI security starts where API security starts: discovery. You can't assess or mitigate risk from AI components you don't know exist. Harness already continuously monitors your environment for new API endpoints the moment they're deployed. Recognizing LLMs, MCP servers, AI agents, and third-party GenAI services like OpenAI and Anthropic is a natural extension of that. AI Discovery automatically inventories your entire AI attack surface in real time, including calls to external GenAI services that could expose sensitive data, and surfaces runtime risks, such as unauthenticated APIs calling LLMs, weak encryption, or regulated data flowing to external models.
Beyond discovering and inventorying your AI application components, we are also introducing AI Testing and AI Firewall in Beta, extending AI Security across the full discover-test-protect lifecycle.
.png)
AI Testing actively probes your LLMs, agents, and AI-powered APIs for vulnerabilities unique to AI-native applications, including prompt injection, jailbreaks, model manipulation, data leakage, and more. These aren't vulnerabilities that a traditional DAST tool is designed to find. AI Testing was purpose-built for AI threats, continuously validating that your models and the APIs that expose them behave safely under adversarial conditions. It integrates directly into your existing CI/CD pipelines, so AI-specific security testing becomes part of every release — not a one-time audit.
.png)
AI Firewall actively protects your AI applications from AI-specific threats, such as the OWASP Top 10 for LLM Applications. It inspects and filters LLM inputs and outputs in real time, blocking prompt injection attempts, preventing sensitive data exfiltration, and enforcing behavioral guardrails on your models and agents before an attack can succeed. Unlike traditional WAF rules that require manual tuning for every new threat pattern, AI Firewall understands AI-native attack vectors natively, adapting to the evolving tactics attackers use against generative AI.
Harness AI Security with AI Discovery is now available in GA, while AI Testing and AI Firewall are available in Beta.
"As AI-assisted development becomes standard practice, the security implications of AI-generated code are becoming a material blind spot for enterprises. IDC research indicates developers accept nearly 40% of AI-generated code without revision, which can allow insecure patterns to propagate as organizations increase code output faster than they expand validation and governance, widening the gap between development velocity and application risk."
— Katie Norton, Research Manager, DevSecOps, IDC
AI Security addresses the risks inside your AI-native applications. Secure AI Coding addresses a different problem: the vulnerabilities your AI tools are introducing into your codebase.
Developers are generating more code than ever, and shipping it faster than ever. AI coding assistants now contribute to the majority of new code at many organizations — and nearly half (48%) of security and engineering leaders are concerned about the vulnerabilities that come with it. AI-generated code arrives in larger commits, at higher frequency, and often with less review than human-written code would receive.
SAST tools catch vulnerabilities at the PR stage — but by then, AI-generated code has already been written, reviewed, and often partially shipped. Harness SAST's new Secure AI Coding capability moves the security check earlier to the moment of generation, integrating directly with AI coding tools like Cursor, Windsurf, and Claude Code to scan code as it appears in the IDE. Developers never leave their workflow. They see a vulnerability warning inline, alongside a prompt to send the flagged code back to the agent for remediation — all without switching tools or even needing to trigger a manual scan.
"Security shouldn't be an afterthought when using AI dev tools. Our collaboration with Harness kicks off vulnerability detection directly in the developer workflow, so all generated code is screened from the start." — Jeff Wang, CEO, Windsurf

What sets Secure AI Coding apart from simpler linting tools is what happens beneath the surface. Rather than pattern-matching the AI-generated code in isolation, it leverages Harness's Code Property Graph (CPG) to trace how data flows through the entire application - before, through, and after the AI-generated code in question. That means Secure AI Coding can surface complex vulnerabilities like injection flaws and insecure data handling that only become visible in the context of the broader codebase. The result is security that understands your application - not just the last thing an AI assistant wrote.
When we deployed AI across our own platform, our AI ecosystem grew faster than our visibility into it. We needed a way to track every API call, identify sensitive data exposure, and monitor calls to external vendors — including OpenAI, Vertex AI, and Anthropic — without slowing down our engineering teams.
Deploying AI Security turned that black box into a transparent, manageable environment. Some milestones from our last 90 days:
The shift wasn't just operational — it was cultural. We moved from reactive monitoring to proactive defense. As our team put it: "Securing AI is foundational for us. Because our own product runs on AI, it must be resilient and secure. We use our own AI Security tools to ensure that every innovation we ship is backed by the highest security standards."
AI is moving fast. Your attack surface is expanding in two directions at once - inside the applications you're building, and inside the code your teams are generating to build them.
Harness AI Security and Secure AI Coding are available now. Whether you're trying to get visibility into the AI running in your environment, test it for vulnerabilities before attackers do, or stop insecure AI-generated code from reaching production, Harness’ platform is ready.
Talk to your account team about AI Security. Get a live walkthrough of AI Discovery, AI Testing, and AI Firewall, and see how your AI attack surface maps against your existing API security posture.
Already a Harness CI customer? Start a free trial of Harness SAST - including Secure AI Coding. Connect it to your AI coding assistant, and see what's shipping in your AI-generated code today.

Today, Harness is announcing the General Availability of Artifact Registry, a milestone that marks more than a new product release. It represents a deliberate shift in how artifact management should work in secure software delivery.
For years, teams have accepted a strange reality: you build in one system, deploy in another, and manage artifacts somewhere else entirely. CI/CD pipelines run in one place, artifacts live in a third-party registry, and security scans happen downstream. When developers need to publish, pull, or debug an artifact, they leave their pipelines, log into another tool, and return to finish their work.
It works, but it’s fragmented, expensive, and increasingly difficult to govern and secure.
At Harness, we believe artifact management belongs inside the platform where software is built and delivered. That belief led to Harness Artifact Registry.
Artifact Registry started as a small, high-ownership bet inside Harness and a dedicated team with a clear thesis: artifact management shouldn’t be a separate system developers have to leave their pipelines to use. We treated it like a seed startup inside the company, moving fast with direct customer feedback and a single-threaded leader driving the vision.The message from enterprise teams was consistent: they didn’t want to stitch together separate tools for artifact storage, open source dependency security, and vulnerability scanning.
So we built it that way.
In just over a year, Artifact Registry moved from concept to core product. What started with a single design partner expanded to double digit enterprise customers pre-GA – the kind of pull-through adoption that signals we've identified a critical gap in the DevOps toolchain.
Today, Artifact Registry supports a broad range of container formats, package ecosystems, and AI artifacts, including Docker, Helm (OCI), Python, npm, Go, NuGet, Dart, Conda, and more, with additional support on the way. Enterprise teams are standardizing on it across CI pipelines, reducing registry sprawl, and eliminating the friction of managing diverse artifacts outside their delivery workflows.
One early enterprise customer, Drax Group, consolidated multiple container and package types into Harness Artifact Registry and achieved 100 percent adoption across teams after standardizing on the platform.
As their Head of Software Engineering put it:
"Harness is helping us achieve a single source of truth for all artifact types containerized and non-containerized alike making sure every piece of software is verified before it reaches production." - Jasper van Rijn
In modern DevSecOps environments, artifacts sit at the center of delivery. Builds generate them, deployments promote them, rollbacks depend on them, and governance decisions attach to them. Yet registries have traditionally operated as external storage systems, disconnected from CI/CD orchestration and policy enforcement.
That separation no longer holds up against today’s threat landscape.
Software supply chain attacks are more frequent and more sophisticated. The SolarWinds breach showed how malicious code embedded in trusted update binaries can infiltrate thousands of organizations. More recently, the Shai-Hulud 2.0 campaign compromised hundreds of npm packages and spread automatically across tens of thousands of downstream repositories.
These incidents reveal an important business reality: risk often enters early in the software lifecycle, embedded in third-party components and artifacts long before a product reaches customers.When artifact storage, open source governance, and security scanning are managed in separate systems, oversight becomes fragmented. Controls are applied after the fact, visibility is incomplete, and teams operate in silos. The result is slower response times, higher operational costs, and increased exposure.
We saw an opportunity to simplify and strengthen this model.

By embedding artifact management directly into the Harness platform, the registry becomes a built-in control point within the delivery lifecycle. RBAC, audit logging, replication, quotas, scanning, and policy enforcement operate inside the same platform where pipelines run. Instead of stitching together siloed systems, teams manage artifacts alongside builds, deployments, and security workflows. The outcome is streamlined operations, clearer accountability, and proactive risk management applied at the earliest possible stage rather than after issues surface.
Security is one of the clearest examples of why registry-native governance matters.
Artifact Registry delivers this through Dependency Firewall, a registry-level enforcement control applied at dependency ingest. Rather than relying on downstream CI scans after a package has already entered a build, Dependency Firewall evaluates dependency requests in real time as artifacts enter the registry. Policies can automatically block components with known CVEs, license violations, excessive severity thresholds, or untrusted upstream sources before they are cached or consumed by pipelines.

Artifact quarantine extends this model by automatically isolating artifacts that fail vulnerability or compliance checks. If an artifact does not meet defined policy requirements, it cannot be downloaded, promoted, or deployed until the issue is addressed. All quarantine and release actions are governed by role-based access controls and fully auditable, ensuring transparency and accountability. Built-in scanning powered by Aqua Trivy, combined with integrations across more than 40 security tools in Harness, feeds results directly into policy evaluation. This allows organizations to automate release or quarantine decisions in real time, reducing manual intervention while strengthening control at the artifact boundary.

The result is a registry that functions as an active supply chain control point, enforcing governance at the artifact boundary and reducing risk before it propagates downstream.
General Availability signals that Artifact Registry is now a core pillar of the Harness platform. Over the past year, we’ve hardened performance, expanded artifact format support, scaled multi-region replication, and refined enterprise-grade controls. Customers are running high-throughput CI pipelines against it in production environments, and internal Harness teams rely on it daily.
We’re continuing to invest in:
Modern software delivery demands clear control over how software is built, secured, and distributed. As supply chain threats increase and delivery velocity accelerates, organizations need earlier visibility and enforcement without introducing new friction or operational complexity.
We invite you to sign up for a demo and see firsthand how Harness Artifact Registry delivers high-performance artifact distribution with built-in security and governance at scale.

TLDR: We have rolled out Project Movement: the ability to transfer entire Harness projects between Organizations with a few clicks. It's been our most-requested Platform feature for a reason. Your pipelines, configurations, and rest come along for the ride.
In Harness, an Account is the highest-scoped entity. It contains organizations and projects. An organization is the space that represents your business unit or team and helps you manage users, access, and shared settings in one place. Within an organization, a project is where your teams do their day-to-day work, such as building pipelines, managing services, and tracking deployments. Projects keep related resources grouped together, making it easier to collaborate, control permissions, and scale across teams.
The main benefit of keeping organizations and projects separate is strong isolation and predictability. By not allowing projects to move between organizations, you can ensure that each organization serves as a rigid boundary for security, RBAC, governance, billing, and integrations. Customers could trust that once a project was created within an org, all its permissions, secrets, connectors, audit history, and compliance settings would remain stable and wouldn’t be accidentally inherited or lost during a move. This reduced the risk of misconfiguration, privilege escalation, broken pipelines, or compliance violations — especially for large enterprises with multiple business units or regulated environments.

However, imagine this scenario: last quarter, your company reorganized around customer segments. This quarter, two teams merged. Next quarter, who knows—but your software delivery shouldn't grind to a halt every time someone redraws the org chart.
We've heard this story from dozens of customers: the experimental project that became critical, the team consolidation that changed ownership, the restructure that reshuffled which team owns what. And until now, moving a Harness project from one Organization to another meant one thing: start from scratch.
Not anymore.

That’s why we have rolled out Project Movement—the ability to transfer entire Harness projects between Organizations with a few clicks. It's been our most-requested Platform feature for a reason. Your pipelines, configurations, and rest come along for the ride.
You're looking at 47 pipelines, 200+ deployment executions, a dozen services, and countless hours of configuration work. The company's org chart says this project now belongs to a different team. Deep breath.
Click the menu. Select "Move Project." Pick your destination Organization.

The modal shows you what might break—Organization-level connectors, secrets, and templates that the project references. Not an exhaustive list, but enough to know what you're getting into.

Type the project identifier to confirm.

Done.

Your project is now in its new Organization. Pipelines intact. Execution history preserved. Templates, secrets, connectors—all right where you left them. The access control migration happens in the background while you grab coffee.
What used to take days of YAML wrangling and "did we remember to migrate X?" conversations now takes minutes.
To summarize:
To move a Harness project between organizations:
1. Open the project menu and select “Move Project.”
2. Choose the destination organization.
3. Review impacted organization-level resources.
4. Confirm by typing the project identifier.
5. Monitor access control migration while pipelines remain intact.
Here's what transfers automatically when you move a project:
Access control follows along too: role bindings, service accounts, user groups, and resource groups. This happens asynchronously, so the move doesn't block, but you can track progress in real-time.
The project becomes immediately usable in its new Organization. No downtime, no placeholder period, no "check back tomorrow."
Let's talk about what happens to Organization-level resources and where you'll spend some time post-move.
Organization-scoped resources don't move—and that makes sense when you think about it. That GitHub connector at the Organization level? It's shared across multiple projects. We can't just yank it to the new Organization. So after moving, you'll update references that pointed to:
After the move, you'll update these references in your pipelines and configurations. Click the resource field, select a replacement from the new Organization or create a new one, and save. Rinse and repeat. The pre-move and post-move guide walks through the process.
A few CD features aren't supported yet, but on the roadmap: GitOps entities, and Continuous Verification don't move with the project. If your pipelines use these, you'll need to manually reconfigure them in the new Organization after the move. The documentation has specific guidance on supported modules and entities.
The Harness hierarchical model, Account > Organization > Project, exists for strong isolation and predictable security boundaries. Moving projects doesn't compromise that architecture. Here's why: Organization-level resources stay put. Your GitHub connectors, cloud credentials, and secrets remain scoped to their Organizations. When a project moves, it doesn't drag sensitive org-wide resources along; it references new ones in the destination. This means your security boundaries stay intact, RBAC policies remain predictable, and teams can't accidentally leak credentials across organizational boundaries. The project moves. The isolation doesn't.
A platform engineering team had a familiar problem: three different product teams each had their own Harness Organization with isolated projects. Made sense when the teams were autonomous. But as the products matured and started sharing infrastructure, the separation became friction.
The platform team wanted to consolidate everything under a single "Platform Services" Organization for consistent governance and easier management. Before project movement, that meant weeks of work—export configurations, recreate pipelines, remap every connector and secret, test everything, hope nothing broke.
With project movement, they knocked it out in an afternoon. Move the projects. Update references to Organization-level resources. Standardize secrets across the consolidated projects. Test a few deployments. Done.
The product teams kept shipping. The platform team got its unified structure. Nobody lost weeks to migration work.
Moving a project requires two permissions: Move on the source project and Create Project in the destination Organization. Both sides of the transfer need to agree—you can't accidentally move critical projects out of an Organization or surprise a team with unwanted projects.
When you trigger a move, you'll type the project identifier to confirm.
A banner sticks around for 7 days post-move, reminding you to check for broken references. Use that week to methodically verify everything, especially if you're moving a production project.
Our recommendation: Try it with a non-production project first. Get a feel for what moves smoothly and what needs attention. Then tackle the production stuff with confidence.
On the surface, moving a project sounds simple-just change where it lives, and you’re done. But in reality, a Harness project is a deeply connected system.
Your pipelines, execution history, connectors, secrets, and audit logs are all tied together behind the scenes. Historically, Harness identified these components using their specific "address" in the hierarchy. That meant if a project moved, every connected entity would need its address updated across multiple services at the same time. Doing that safely without breaking history or runtime behavior was incredibly risky.
To solve this, we re-architected the foundation.
We stopped tying components to their location and introduced stable internal identifiers. Now, every entity has a unique ID that travels with it, regardless of where it lives. When you move a project, we simply update its parent relationship. The thousands of connected components inside don’t even realize they’ve moved.
This architectural shift is what allows us to preserve your execution history and audit trails while keeping project moves fast and reliable.
This is version one. The foundations are solid: projects move, access control migrates, pipelines keep running. But we're not done.
We're listening. If you use this feature and hit rough edges, we want to hear about it.
Organizational change is inevitable. The weeks of cleanup work afterward don't have to be.
Project Movement means your Harness setup can adapt as fast as your org chart does. When teams move, when projects change ownership, when you consolidate for efficiency, your software delivery follows without the migration overhead.
No more lost history. No more recreated pipelines. No more week-long "let's just rebuild everything in the new Organization" projects.
Ready to try it? Check out the step-by-step guide or jump into your Harness account and look for "Move Project" in the project menu.


At SREday NYC 2026, the ShipTalk podcast welcomed Zachary Gruenberg, Solution Engineer and Machine Identity SME at Palo Alto Networks, for a conversation about one of the fastest growing challenges in modern infrastructure: machine identity management.
Throughout the conference, much of the discussion centered on AI agents automating operational tasks—from incident response to infrastructure management. But every automated agent interacting with systems still requires credentials and access permissions.
In the episode, ShipTalk host Dewan Ahmed, Principal Developer Advocate at Harness, spoke with Zachary about how the rapid rise of AI-driven automation is creating an explosion of machine identities—and why managing them is quickly becoming a major security concern for SRE and platform teams.
In the past, identity management primarily focused on human users logging into systems.
Today, the landscape looks very different.
Modern infrastructure environments include a growing number of non-human identities such as:
Each of these components requires credentials in order to interact with infrastructure, APIs, and other services.
As organizations deploy more automation and AI-driven workflows, the number of machine identities can quickly outnumber human users by several orders of magnitude.
For SRE teams, this creates a new challenge: tracking which systems have access to what resources—and ensuring those permissions remain secure.
One of the most common problems Zachary sees is that teams prioritize functionality when deploying new automation systems.
When engineers introduce AI agents or automated workflows, identity management is often treated as an afterthought.
That approach can lead to:
To address this, Zachary encourages organizations to treat machine identity as a core component of their security architecture, rather than a secondary concern.
This often includes practices such as:
When these controls are built into the platform early, security can scale alongside automation instead of becoming a bottleneck.
Despite the growing awareness of identity security, Zachary frequently encounters one recurring issue.
Many teams simply lose track of the machine identities they have created.
Over time, environments accumulate service accounts, API keys, tokens, and automation credentials that remain active long after the systems that created them are gone.
This “identity sprawl” can create significant risk, particularly in environments where automated systems are interacting with critical infrastructure.
The challenge becomes even greater as AI agents begin performing more complex operational tasks.
Ensuring that these agents have the right level of access—and no more—requires visibility into every identity operating within the system.
As organizations adopt AI-driven automation across operations, the importance of identity security will only increase.
Each new automation tool or AI workflow adds another layer of machine identities interacting with infrastructure.
For SRE and platform teams, this means reliability engineering and security practices are becoming increasingly interconnected.
Strong machine identity management ensures that automation systems can operate safely while protecting the infrastructure they interact with.
Zachary Gruenberg’s message is a timely reminder that the growth of AI agents and automation does not eliminate the need for strong security foundations.
If anything, it makes them even more critical.
As organizations move toward more autonomous systems, understanding who—or what—has access to critical infrastructure will remain one of the most important challenges for reliability and security teams alike.
Subscribe to the ShipTalk Podcast
Enjoy conversations like this with engineers, platform builders, and reliability leaders from across the industry.
Follow ShipTalk on your favorite podcast platform and stay tuned for more stories from the people building the systems that power modern technology. 🎙️🚀


--
Key Takeaways:
The Harness MCP server is an MCP-compatible interface that lets AI agents discover, query, and act on Harness resources across CI/CD, GitOps, Feature Flags, Cloud Cost Management, Security Testing, Resilience Testing, Internal Developer Portal, and more.
--
The first wave of MCP servers followed a natural pattern: take every API endpoint, wrap it in a tool definition, and expose it to the LLM. It was fast to build, easy to reason about, and it was exactly how we built the first Harness MCP server. That server taught us a lot: solid Go codebase, well-crafted tools, broad platform coverage across 30 toolsets. It also taught us where the one-tool-per-endpoint model hits a wall.
For platforms the size of Harness, spanning the entire SDLC, the pattern doesn't scale. When you expose one tool per API endpoint, you're asking the LLM to be a routing layer, forcing it to do something a switch statement does better. Every tool definition consumes context that could be spent on reasoning. At ~175 tools, that's ~26% of the LLM's context window before the developer even types a prompt.
So we iterated. The Harness MCP v2 redesign does the same work with 11 tools at ~1.6% context consumption. The answer isn't fewer features, it's a different architecture: a registry-based dispatch model where the LLM reasons about what to do, and the server handles how to do it.
When an MCP client connects to a server, it loads every tool definition into the LLM's context window. Every name, description, parameter schema, and annotation. For the first Harness server at ~130+ active tools, here's what that costs:

That's the core insight: the first server uses ~26% of context on tool definitions before any work begins. The v2 uses ~1.6%.
This isn't a theoretical concern. Research on LLM behavior in large context windows, including Liu et al.'s "Lost in the Middle" findings, shows that models struggle to use information placed deep within long contexts. As Ryan Spletzer recently wrote, dead context doesn't sit inertly: "It dilutes the signal. The model's attention is spread across everything in the window, so the more irrelevant context you pack in, the less weight the relevant context carries."
Anthropic's own engineering team has documented this trade-off: direct tool calls consume context for each definition and result, and agents scale better when the tool surface area is deliberately constrained.
The problem compounds in real-world developer environments. If you're running Cursor or Claude Code with a Playwright MCP, a GitHub MCP, and the Harness MCP, those tool definitions stack. EclipseSource's analysis shows that a standard set of MCP servers can eat 20% of the context window before you even type a prompt. The recommendation: stay below 40% total context utilization. Any MCP server with 100+ tools, ours included, would consume more than half that budget on its own.
The context window tax isn't unique to Harness: it's an industry-wide problem. Here's how the v2 server compares to popular MCP servers in the wild:

Lunar.dev research: "5 MCP servers, 30 tools each → 150 total tools injected. Average tool description: 200–500 tokens. Total overhead: 30,000–60,000 tokens. Just in tool metadata." MCP server v2 at ~3,150 tokens would represent just 5–10% of a typical multi-server setup's overhead.
Real-world Claude Code user: A developer on Reddit r/ClaudeCode with Playwright, Context7, Azure, Postgres, Zen, and Firecrawl MCPs reported 83.3K tokens (41.6% of 200K) consumed by MCP tools immediately after /clear. That's before a single prompt.
Anthropic's code execution findings: Anthropic's engineering team reported that a workflow consuming 150,000 tokens was reduced to ~2,000 tokens (a 98.7% reduction) by switching from direct tool calls to code-based tool invocation. The principle is clear: fewer, smarter tools beat more, narrower ones.
MCPAgentBench: An academic benchmark found that "nearly all evaluated models exhibit a decline of over 10 points in task efficiency when tool selection complexity increases." Models overwhelmed with tools prioritize task resolution over execution efficiency. They get the job done, but waste tokens doing it.
Cursor enforces an 80-tool cap, OpenAI limits to 128 tools, and Claude supports up to ~120. The v2 server's 11 tools leave massive headroom to run Harness alongside other MCP servers without hitting these limits.
Consider a concrete example: a developer running Cursor with Playwright (21 tools), GitHub MCP (~40 tools), and the old Harness MCP (~175 tools) would hit ~236 tools, well past Cursor's 80-tool cap. With v2 Harness (11 tools), the same stack is 72 tools, comfortably under the limit.
With Claude Code, the same old stack would burn ~76,400 tokens (~38%) on tool definitions alone. With v2, it drops to ~27,550 tokens (~14%), freeing ~48,850 tokens for actual reasoning and conversation.
The MCP ecosystem is in the middle of a reckoning. Scalekit ran 75 benchmark runs comparing CLI and MCP for identical GitHub tasks on Claude Sonnet 4, and CLI won on every efficiency metric: 10–32x cheaper, 100% reliable vs MCP’s 72%. For a simple “what language is this repo?” query, CLI used 1,365 tokens. MCP used 44,026 — almost entirely from schema injection of 43 tool definitions the agent never touched.
The Playwright team shipped the same verdict in hardware. Their new CLI tool saves browser state to disk instead of flooding context. In BetterStack’s benchmarks, CLI used ~150 tokens per interaction vs MCP’s ~7,400+ of accumulated page state. CircleCI found CLI completed browser tasks with 33% better token efficiency and a 77 vs 60 task completion score.
The CLI camp’s argument is real: schema bloat kills performance. But their diagnosis points at the wrong layer. The problem isn’t MCP. It’s naive MCP server design.
CLI wins when the agent already knows the tool. gh, kubectl, terraform: these have extensive training data. The agent composes commands from memory, pays zero schema overhead, and gets terse, predictable output. Scalekit found that adding an 800-token “skills document” to CLI reduced tool calls and latency by a third.
CLI also wins on composition. Piping grep into jq into xargs chains operations in a single tool call. An MCP agent doing the same work makes N round-trips through the LLM, each one burning context.
But CLI’s advantages dissolve the moment you cross three boundaries:
CLI works when the agent knows the command. For a platform like Harness, with 122+ resource types across CI/CD, GitOps, FinOps, security, chaos, and IDP, the agent can’t know the API surface from training data alone. MCP’s harness_describe tool lets the agent discover capabilities at runtime. CLI would require the agent to guess curl commands against undocumented APIs.
As Scalekit themselves concluded: “The question isn’t CLI or MCP. It’s who is your agent acting for?” CLI auth gives the agent ambient credentials: your token. For multi-tenant, multi-user environments (which is where Harness operates), MCP provides per-user OAuth, explicit tool boundaries, and structured audit trails.
CLI agents can run arbitrary shell commands. An MCP server constrains the agent to declared tools with typed inputs. The v2 server’s elicitation-based confirmation flows, fail-closed deletes, and read-only mode are protocol-level safety guarantees that CLI can’t replicate.
The CLI vs MCP debate is really about schema bloat and naive tool design. The v2 Harness MCP server eliminates the arguments against MCP without losing the arguments for it:
Schema bloat? 11 tools at ~3,150 tokens. That’s less than a single CLI help output for a complex tool. Cursor’s 80-tool cap? We use 11. The 44,026-token GitHub MCP problem? We’re 14x leaner.
Round-trip overhead? The registry-based dispatch means the agent makes one tool call to harness_diagnose and gets back a complete execution analysis — pipeline structure, stage/step breakdown, timing, logs, and root cause. A CLI agent would need to chain 4–5 API calls to assemble the same picture.
Discovery? harness_describe is a zero-API-call local schema lookup. The agent discovers 125+ resource types without a single network request. CLI would require a man page the agent has never seen.
Composition? Skills + prompt templates encode multi-step workflows (build-deploy-app, debug-pipeline-failure) as server-side orchestration. The agent reasons about what to do; the server handles how to chain it. Same efficiency as a CLI pipe, with protocol-level safety.
The real lesson from the benchmarks: MCP servers with 43+ tools and no architecture for context efficiency will lose to CLI on cost metrics. But a well-designed MCP server with 11 tools, a registry, and a skills layer outperforms both naive MCP and naive CLI — and provides authorization, safety, and discoverability that CLI architecturally cannot.
We stopped designing for API parity and started designing for agent usability.
The v2 server is built around a registry-based dispatch model. Instead of one tool per endpoint, we expose 11 intentionally generic verbs. The intelligence lives in the registry: a declarative data structure that maps resource types to API operations.

When an agent calls harness_list(resource_type="pipeline"), the server looks up pipeline in the registry, resolves the API path, injects scope parameters (account, org, project), makes the HTTP call, extracts the relevant response data, and appends a deep link to the Harness UI. The agent never needs to know the underlying API structure.
Each registry entry is a declarative ResourceDefinition:
{
resourceType: "pipeline",
displayName: "Pipeline",
toolset: "pipelines",
scope: "project",
identifierFields: ["pipeline_id"],
operations: {
list: {
method: "GET",
path: "/pipeline/api/pipelines/list",
queryParams: { search_term, page, size },
responseExtractor: (raw) => raw.content
},
get: {
method: "GET",
path: "/pipeline/api/pipelines/{pipeline_id}",
responseExtractor: (raw) => raw.data
}
}
}
Adding support for a new Harness module requires adding one declarative object to the registry. No new tool definitions. No changes to MCP tool schemas. The LLM's tool vocabulary stays constant as the platform grows.
Today, the registry covers 125+ resource types across 30 toolsets, spanning the full Harness platform:
The architecture wasn't designed in a vacuum. We built it specifically for the environments developers actually use.
Cursor and Windsurf connect via stdio transport — the server runs as a local process alongside the IDE. With 11 tools instead of 130+, the Cursor agent has a minimal, clear menu. It doesn't waste reasoning cycles on tool selection or get confused by 40 CCM-specific tools when the developer is debugging a pipeline failure.
For teams that only use specific Harness modules, HARNESS_TOOLSETS lets you filter at startup:
{
"mcpServers": {
"harness": {
"command": "npx",
"args": ["-y", "harness-mcp-v2@latest"],
"env": {
"HARNESS_API_KEY": "pat.xxx.yyy.zzz",
"HARNESS_TOOLSETS": "pipelines,services,connectors"
}
}
}
}
The agent only sees resource types from the enabled toolsets. The rest don't exist as far as the LLM is concerned.
Claude Code excels at multi-step workflows. We leaned into that with 26 prompt templates across four categories:
Each prompt template encodes a multi-step workflow the agent can execute. debug-pipeline-failure doesn't just fetch an execution — it calls harness_diagnose, follows chained failures, and produces a root cause analysis with actionable fixes.
The v2 server also supports multi-project workflows without hardcoded environment variables. An agent can dynamically discover the account structure, then scope subsequent calls with org_id and project_id parameters. No configuration changes needed.
Every tool accepts an optional url parameter. Paste a Harness UI URL, a pipeline page, an execution log, a dashboard, and the server automatically extracts the account, org, project, and resource identifiers. The agent gets context without the developer having to specify it manually.
Reducing tool count solves the context efficiency problem. But developers don't just need fewer tools — they need tools that know how to chain together into real workflows. That's where Harness Skills come in.
The v2 server ships with a companion skills layer (github.com/thisrohangupta/harness-skills) that turns raw MCP tool access into guided, multi-step workflows. Skills are IDE-native agent instructions that teach the AI how to use the MCP server effectively — without the developer having to explain Harness concepts or orchestration patterns.
Skills operate at three levels:
Every IDE gets a base instruction file, loaded automatically when the agent starts:
These files teach the agent: what the 11 tools do, how Harness scoping works (account → org → project), dependency ordering (always verify referenced resources exist before creating dependents), and how to extract context from Harness UI URLs.
The 26 MCP prompt templates registered directly in the server. Any MCP client can invoke them. They encode multi-step workflows with phase gates, e.g., build-deploy-app structures a 4-phase workflow (clone → scan → CI pipeline → deploy) with explicit "do not proceed until this step is done" checkpoints.
Specialized SKILL.md files that function as slash commands in the IDE. Each skill includes YAML frontmatter (trigger phrases, metadata), phased instructions, worked examples, performance notes, and troubleshooting steps.
Without skills, a developer says "deploy my Node.js app" and the agent has to figure out the right Harness concepts, the correct ordering, and the proper API calls from scratch. With skills, the flow is:
harness_list / harness_create / harness_execute callsThe skills layer delivers three measurable improvements:
Without skills, the agent typically needs 3–5 exploratory tool calls to understand Harness's resource model before starting real work. Skills encode this knowledge upfront — the agent knows to check for existing connectors before creating a pipeline, to verify environments exist before deploying, and to use harness_describe for schema discovery instead of trial-and-error.
Harness resources have strict dependency chains (connector → secret → service → environment → infrastructure → pipeline → trigger). Skills encode the 7-step "Deploy New Service" and 8-step "New Project Onboarding" workflows as ordered sequences. The agent doesn't discover dependencies through failures, it follows the prescribed order.
Each failed API call and retry burns tokens. Skills eliminate the most common failure modes (wrong scope, missing dependencies, incorrect parameter formats) by teaching the agent the patterns before execution. The combination of 11 tools (minimal context overhead) plus skills (minimal wasted calls) means more of the context window is available for the developer's actual task.
The first Harness MCP server (harness/mcp-server) pioneered the IDE-native pattern with a review-mcp-tool command that works across Cursor, Claude Code, and Windsurf via symlinked definitions:
One canonical definition in .harness/commands/, symlinked to all three. Update once, propagate everywhere.
The v2 skills layer extends this pattern from developer-tool commands to full DevOps workflows, the same "define once, deploy to every IDE" architecture, applied to pipeline creation, deployment debugging, cost analysis, and security review.
MCP servers that can create, update, and delete resources need safety guardrails. We built them in from the start.
Human-in-the-loop confirmation: All write operations use MCP elicitation to request explicit user confirmation before executing. The agent presents what it intends to do; the developer approves or rejects.
Fail-closed destructive operations: harness_delete is blocked entirely if the MCP client doesn't support elicitation. No silent deletions.
Read-only mode: Set HARNESS_READ_ONLY=true for shared environments, demos, or when you want agents to observe but not act.
Secrets safety: The secret resource type exposes metadata (name, type, org, project) but never the secret value itself.
Rate limiting and retries: Configurable rate limits (default: 10 req/s), automatic retries with backoff for transient failures, and bounded pagination to prevent runaway list operations.
The v2 server supports two transports:
For team deployments, the HTTP transport is compatible with MCP gateways like Portkey, LiteLLM, and Envoy-based proxies, enabling shared control planes with centralized auth, observability, and policy enforcement.
# Local (Cursor, Claude Code)
npx harness-mcp-v2@latest
# Remote (team deployment)
npx harness-mcp-v2@latest http --port 3000
# Docker
docker run -e HARNESS_API_KEY=pat.xxx.yyy.zzz harness-mcp-v2
The shift from 130+ tools to 11 isn't about simplification for its own sake. It's about recognizing that the best MCP servers are capability-oriented agent interfaces, not API mirrors.
Building the first Harness MCP server taught us the same lesson the broader ecosystem is learning: when you expose one tool per API endpoint, you're asking the LLM to be a routing layer. You're consuming context on definitions that could be used for reasoning. And you're fighting against the LLM's actual strengths, reasoning, planning, and multi-step problem solving, by forcing it to do something a switch statement does better. That first server made the cost concrete. The v2 is our answer.
The registry pattern inverts this. The tool vocabulary is stable: 11 verbs today, 11 verbs when Harness ships 50 more resource types. The registry is extensible. The skills layer is composable. The LLM reasons about what to do, and the server handles how to do it. That's not just an efficiency win — it's the correct division of labor between an LLM and a server.
This is the pattern we think more MCP servers should adopt, especially platforms with broad API surfaces. The MCP specification itself is built on the idea that servers expose capabilities, not endpoints. We took that literally.
The efficiency gains from the v2 architecture translate directly into concrete, time-saving use cases for developers operating within their IDEs. The combination of a minimal tool surface (11 tools), deep resource knowledge (125+ resource types), and pre-encoded workflows (Harness Skills) allows the agent to handle complex DevOps tasks with minimal guidance.
See it in action:
Some other use cases:
Debug a Failed CI Pipeline: Get root cause and logs for a pipeline run.
Onboard New Service: Create a Service, Environment, Infrastructure, and initial Connector.
Review Cloud Cost Anomaly: Investigate a sudden spike in cloud spend.
Check Compliance Status: Verify a service's SBOM compliance against OPA policies.
Deploy App to Prod: Execute a canary deployment pipeline.
npx harness-mcp-v2@latest
Configure with your Harness PAT (account ID is auto-extracted):
HARNESS_API_KEY=pat.<accountId>.<tokenId>.<secret>
Full source: github.com/thisrohangupta/harness-mcp-v2
Official Harness MCP Server: github.com/harness/mcp-server
---
The Harness MCP server is an MCP-compatible server that lets AI agents interact with Harness resources using a small set of generic tools.
Each exposed tool adds metadata to the model context. A smaller tool surface leaves more room for reasoning and task execution.
Instead of exposing one tool per API endpoint, it uses 11 generic tools plus a registry that maps resource types to the correct API operations.
The post mentions Cursor, Claude Code, Claude Desktop, Windsurf, Gemini CLI, and other MCP-compatible clients.
The design includes write confirmations, fail-closed delete behavior, read-only mode, and controls for retries, rate limiting, and deployment transport.
.jpg)
.jpg)
CI/CD tools are software platforms that automate code integration, testing, release preparation, and deployment. They connect source control, build systems, test frameworks, and runtime environments into a repeatable delivery pipeline.
CI/CD tools sit at the center of how modern teams ship software. Instead of pushing risky, manual releases once a month, you automate builds, tests, and deployments so every change follows the same, reliable path to production. Done right, CI/CD turns release day from an “all‑hands fire drill” into just another commit.
In this guide, we will walk through what ci cd tools are, the key features that actually matter, and how to choose the right platform for your stack.
Along the way, we will show how platforms like Harness Continuous Integration and Harness Continuous Delivery & GitOps bring AI, governance, and deep insights together so you can ship faster without losing control.
CI/CD tools are the backbone of modern software delivery. They automate the process of building, testing, and deploying code, so changes can move from commit to production with minimal friction.
At a minimum, effective CI/CD tools:
To go deeper on pipelines themselves, see our guide on the basics of CI/CD pipelines.
The importance of CI/CD tools in today's software development ecosystem is hard to ignore. They address several challenges teams face every day:
Martin Fowler defined Continuous Integration (CI) as “a software development practice where each member of a team merges their changes into a codebase together with their colleagues' changes at least daily.” Each integration triggers automated builds and tests, allowing teams to detect and address integration issues early. This approach helps maintain a consistently stable codebase and reduces the time and effort required for integration at later stages of development.
Modern CI/CD tools extend this by making those builds faster and more insightful, surfacing exactly which tests or components were impacted by a given change.
The "CD" in CI/CD can stand for either Continuous Delivery or Continuous Deployment. While closely related, these concepts have distinct implications for the software release process.
Continuous Delivery is an extension of continuous integration. It automates the process of preparing code changes for release to production. In continuous delivery, every change that passes automated tests is kept in a production-ready state and can be deployed at any time, often with a manual approval step before release. Additional tests and security scans are run in these test environments. This allows for manual approval and additional testing before the final push to production.
Teams often rely on CI/CD tools with strong approval workflows and policy controls here, so releases stay safe without turning into ticket‑driven bottlenecks.
Continuous Deployment takes automation a step further. In this model, every change that passes the automated tests is automatically deployed to production without manual intervention.
This approach requires a high degree of confidence in the testing process and can significantly reduce the time between writing code and seeing it live in production.
In practice, only teams with mature testing, monitoring, and rollback capabilities should aim for full continuous deployment.
Not all CI/CD tools solve the same problems. When you compare options, focus on a few core dimensions:
While CI/CD and DevOps are often mentioned in the same breath, they are not synonymous. CI/CD refers to specific practices and tools within the software development lifecycle, while DevOps is a broader cultural and operational philosophy.
DevOps aims to break down barriers between development and operations teams, fostering collaboration and shared responsibility. CI/CD practices are a key component of DevOps, but DevOps encompasses a wider range of principles and practices aimed at improving overall software delivery and operational performance.
Think of CI/CD tools as the automation layer that makes DevOps ways of working real in day‑to‑day delivery.
CI/CD security is a critical consideration in modern software development. It involves implementing security measures throughout the CI/CD pipeline to protect against vulnerabilities and ensure the integrity of the software delivery process. This includes:
By integrating security into the CI/CD pipeline, organizations can shift security left, addressing potential issues earlier in the development process and reducing the risk of security breaches in production environments. For more information, check out DevSecOps in the Harness Academy.
If you are building or modernizing pipelines today, plan security into your CI/CD tools selection from day one.
Advanced platforms also bring AI into this space. Harness, for example, offers AI‑assisted deployment verification that automatically analyzes metrics and logs during deployments to catch anomalies and trigger safe rollbacks.
The CI/CD tooling landscape is diverse, offering solutions for various needs and preferences. Some common CI/CD tools include:
Each of these CI/CD tools has strengths. The right choice depends on your existing ecosystem, team skills, compliance needs, and appetite for maintaining tooling.
A practical evaluation process for CI/CD tools looks something like this:
If you are comparing cloud‑hosted vs self‑managed approaches, our article on cloud-based CI/CD options outlines trade‑offs across control, cost, and operational overhead.
Harness stands out in the CI/CD tooling landscape as a comprehensive Software Delivery Platform that addresses the complexities of modern software development. Here's how Harness can elevate your CI/CD processes:
In practice, that looks like:
By adopting Harness as your CI/CD tools platform, you can streamline software delivery, improve code quality, and accelerate time to market while still meeting strict security and governance requirements.
CI/CD tools are software systems that automate how code is built, tested, and deployed. They connect your source control, test suites, and runtime environments into a repeatable pipeline so every change follows the same path to production.
Many CI/CD tools focus just on automation for builds and deployments. DevOps platforms go further with governance, security, cost controls, and developer self‑service. Harness combines both, so you do not need a separate stack of ad‑hoc scripts and point tools.
Yes. Even very small teams benefit from automated builds and tests. Manual steps are fragile and do not scale. Starting with CI/CD tools early keeps quality high and avoids painful rewrites of your delivery process later.
They provide consistent places to run security scans, enforce policies, and control who can deploy what. When combined with DevSecOps practices and capabilities like AI‑assisted verification, CI/CD tools help catch vulnerabilities before they hit customers.
Look for fast feedback, strong integration with your Git provider, clear governance stories, and evidence that the tool can handle your scale. AI‑driven insights and good observability into pipelines are now table stakes for serious teams.
Traditional tools often require heavy scripting and manual integration. Harness focuses on intelligent automation, policy‑driven governance, and a unified platform that covers CI, CD, and insights in one place, so platform teams can standardize delivery without slowing developers down.
With Harness Continuous Delivery & GitOps, you can create reusable templates for rolling deployment pipelines, link them to observability tools, and use AI to help with verification. Harness checks metrics and logs at every step of a rollout and can automatically pause or roll back if there are any problems. This makes rolling deployment a low-effort, repeatable process.
In GitOps, manifests stored in Git describe how rolling deployments should work, and tools like Argo CD make sure that the desired state is reflected in Kubernetes clusters. Platforms like Harness GitOps add enterprise-level visibility, governance, and promotion workflows to Argo CD. This makes it easier to run rolling deployments on a large scale across many services and clusters.
.png)
.png)
Modern software delivery has dramatically accelerated. AI-assisted development, automated CI/CD pipelines, and cloud-native architectures have made it possible for teams to deploy software dozens of times per day.
But speed alone does not guarantee reliability.
At Conf42 Site Reliability Engineering (SRE) 2026, Uma Mukkara, Head of Resilience Testing at Harness and co-creator of LitmusChaos, delivered a clear message: outages are inevitable. In modern distributed systems, assuming your design will always work is not just optimistic—it’s risky.
In fact, as Uma put it, failure in distributed systems is a mathematical certainty.
That’s why resilience testing must become a core, continuous practice in the Software Development Life Cycle (SDLC).
Even the most reliable cloud providers experience outages.
Uma illustrated this with examples that highlight how unpredictable failures can be:
These incidents demonstrate an important reality: the types of failures constantly evolve.
A system validated during design may not be resilient against tomorrow’s failure scenarios. Architecture may stay the same, but the failure patterns surrounding it continuously change.
This is why resilience cannot rely on assumptions.
Hope is not a strategy—verification is.
For a deeper look at this broader approach to resilience, see how chaos engineering, load testing, and disaster recovery testing work together.
Resilience is often misunderstood as simply keeping systems online.
But uptime alone does not make a system resilient.
Uma defines resilience more precisely:
Resilience is the grace with which systems handle failure and return to an active state.
In practice, a resilient system must handle three categories of disruption:
Pod crashes, node failures, infrastructure disruptions, or network faults.
Traffic spikes or sudden demand that pushes systems to their limits.
Regional outages, multi-AZ failures, or infrastructure loss that require recovery mechanisms.
If teams test only one of these dimensions, they leave significant risks undiscovered.
True resilience requires verifying how systems behave across all three scenarios.
One of the biggest challenges Uma highlighted is how organizations treat resilience.
Many teams still see it as a “day-two problem”—something SREs will handle after systems are deployed.
Others assume that once resilience has been validated during system design, the problem is solved.
In reality, resilience must be continuously verified.
As systems evolve with each release, so do their failure modes. The most effective strategy is to:
This approach shifts resilience testing into the outer loop of the SDLC, alongside functional and performance testing.
Instead of waiting for production incidents, teams proactively identify weaknesses before customers experience them.
Uma introduced an important concept: resilience debt.
Resilience debt is similar to technical debt. When teams postpone resilience validation, they leave hidden risks unresolved in the system.
Over time, that debt accumulates.
And when failure eventually occurs—which it inevitably will—the business impact grows proportionally to the resilience debt that was ignored.
The only way to reduce this risk is to steadily increase resilience testing coverage over time.
As testing matures across multiple quarters, organizations gain better feedback about system behavior, uncover more risks earlier, and continuously reduce the likelihood of severe outages.
Another key takeaway from Uma’s session is that resilience testing should not happen in silos.
Many organizations treat chaos testing, load testing, and disaster recovery validation as separate initiatives owned by different teams.
But the most meaningful risks often appear when these scenarios intersect.
For example:
That’s why resilience testing must be approached as a holistic practice combining:
You can explore the fundamentals of resilience testing in the Harness documentation.
Resilience testing also requires collaboration across multiple roles.
Developers, QA engineers, SREs, and platform teams all contribute to validating system reliability.
Uma pointed out that many organizations already share infrastructure for testing but run different experiments independently. By coordinating these efforts, teams can:
Resilience becomes significantly stronger when personas, environments, and test assets are shared rather than siloed.
As systems become more complex, another challenge emerges: knowing what to test and when.
Large organizations may have hundreds of potential experiments, making it difficult to prioritize testing effectively.
Uma described how agentic AI systems can help address this challenge.
By analyzing internal knowledge sources such as:
AI systems can recommend:
These recommendations allow teams to run the right tests at the right moment, improving resilience coverage without overwhelming engineering teams.
To support this holistic approach, Harness has expanded its original Chaos Engineering capabilities into a broader platform: Harness Resilience Testing.
The platform integrates multiple testing disciplines in a single environment, enabling teams to:
By combining these capabilities, teams gain a single pane of glass for identifying resilience risks across the SDLC.
This unified view allows organizations to track trends in system reliability and proactively address weaknesses before they turn into production incidents.
Uma closed the session with a clear conclusion.Resilience testing is not optional.
Outages will happen. Infrastructure will fail. Traffic patterns will change. Dependencies will break.
What matters is whether organizations have continuously validated how their systems behave when those failures occur.
The more resilience testing coverage teams build over time, the more feedback they receive—and the lower the potential business impact becomes.
In modern software delivery, resilience is no longer just a reliability practice.
It is a core discipline of the enterprise SDLC.
Ready to start validating your system’s resilience?
Explore Harness Resilience Testing and start validating reliability across your SDLC.
.png)
.png)
E2E Testing Has a New Bottleneck, and It's Not the Code
End-to-end (E2E) testing has always been the hardest part of a QA strategy. You're simulating real users, navigating real flows, validating real outcomes across browsers, environments, and data states that never hold still.
Traditional test automation tackled this with scripts: rigid, deterministic sequences tied to element selectors and hard-coded values. They worked until the UI changed. Or the data changed. Or a new team member touched the wrong locator. The result: flaky, expensive test maintenance cycles that teams quietly stopped trusting.
AI-driven testing/AI Test Automation promised to fix this. And it has, but only for teams who figured out the new bottleneck. It's not the model. It's not the tooling. It's the prompt engineering.
In AI test automation, you don't write scripts anymore; you write instructions. And the quality of those instructions determines everything that follows.
In general AI usage, a prompt is the input you give to get an output. In intelligent test automation, it's much more specific: a prompt is a natural language instruction that tells the AI testing engine what to do, what to verify, and how to handle what it finds.
A complete, well-formed test prompt for E2E automation includes five ingredients:
Goal
What business outcome is being tested? (e.g., 'User completes checkout with a promo code applied')
Context
Where does the test start? What preconditions exist? What user state or data should be assumed?
Specifics
Exact values, field names, amounts, account types, formats, and no ambiguity about inputs or expected data.
Assertion
What does success look like? A confirmation message? A balance update? A redirect to a specific URL?
Boundaries
What should the AI NOT do? What's out of scope for this particular test step?
Miss any one of these, and you've handed the AI a half-built blueprint. It will fill in the gaps, just not necessarily the way you intended.
Here's the fundamental truth of AI-driven testing: non-deterministic prompts produce non-deterministic tests. And non-deterministic tests are worse than no tests at all; they create false confidence and burn engineering time chasing phantom failures.

The good news: prompt quality is entirely within your control. Unlike flaky network conditions or unpredictable UI re-renders, a badly written prompt is just a rewrite away from being a reliable one. This is the foundation of self-healing tests. Better prompts dramatically increase the likelihood that the tests can self-heal. Let's break down where prompts go wrong and right.
✅ EFFECTIVE PROMPT
"Navigate to the checkout page, apply promo code SAVE20, and verify the order total shows $80.00 after the discount is applied from $100.00."
❌ WEAK PROMPT
"Go to checkout and check the discount works."
✅ EFFECTIVE PROMPT
"Click on the row in the Orders table where the Status column shows 'Completed,' and the Order ID matches the ORDER_ID parameter."
❌ WEAK PROMPT
"Click on the completed order."
✅ EFFECTIVE PROMPT
"After the payment confirmation spinner disappears, assert: Is the text 'Payment Successful' visible on screen?"
❌ WEAK PROMPT
"Check that payment worked."
Pattern 1: The Intent + Outcome Pattern
Lead with the business intent, end with the verifiable outcome. This structure forces you to be clear about both what you're doing and how you'll know it worked.
"Complete a standard checkout as a guest user with item SKU-4421, shipping to postcode 90210, and verify the order confirmation page displays an order number."
Why it works: The AI knows the starting intent, the data to use, and exactly what constitutes success. No room for interpretation.
Pattern 2: The Precondition Guard
State what must be true before the test action begins. This prevents cascading failures caused by the AI attempting steps when the application isn't in the right state.
"Given the user is logged in and has at least one saved payment method, navigate to the subscription renewal page and click 'Renew Now'."
Why it works: Guards against false failures. If the precondition isn't met, the test fails meaningfully, not mysteriously.
Pattern 3: Content-Based References (Not Positions)
Never reference UI elements by their position on screen. Reference them by their visible content, label, or semantic role. This is the single biggest driver of self-healing tests and reduces test maintenance dramatically.
✅ EFFECTIVE PROMPT
"Select the product named 'Wireless Mouse' from the search results."
❌ WEAK PROMPT
"Select the second item in the search results."
Why it works: Lists reorder. Pages change. Content-based references survive both.
Pattern 4: Atomic Assertions
One assertion should test one condition. Compound assertions ('check that X is visible AND says Y AND the button is enabled') are harder for the AI to evaluate cleanly and produce confusing failure messages.
"Is the error message 'Invalid credentials' visible below the login form?"
Not: 'Is the error message visible and does it say Invalid credentials and is the login button still enabled?', split these into three separate assertions.
Pattern 5: The Fallback Instruction
For data that may not always exist (discounts, optional fields, conditional UI elements), always specify what the AI should do when that data is absent.
"Extract the promotional banner text into PROMO_TEXT, or set PROMO_TEXT to 'none' if no promotional banner is displayed on the page."
Why it works: Tests that handle absence are far more stable across different data states and environments.
Harness AI Test Automation (AIT) is one of the most complete implementations of prompt-driven E2E testing available today. It reduces the need to manually script Selenium/Playwright flows with an intent-driven model: you describe what a user wants to achieve, and Harness AI figures out how to test it.
The platform is built on an agentic AI testing architecture, an autonomous testing system that blends LLM reasoning with real-time application exploration, DOM analysis, and screenshot-based visual validation. What makes it especially relevant to this discussion is that Harness AIT exposes the quality of your prompts directly: write a vague intent, get an unreliable test. Write a precise one, get a test that runs stably in your CI/CD testing pipeline.

"Rather than scripting every step of 'add item to cart and checkout,' a tester writes: Verify that a user can add an item to the cart and complete checkout successfully. The AI testing tool interprets the intent and executes the full flow, including assertions."
Harness structures AI instructions into four command types for codeless test automation. Each has its own prompting rules; get them right and your tests become dramatically more stable.
AI Assertion - Verify application state at a specific point in execution
Write it like this:
"In the confirmation dialog, is the deposit amount displayed as $100.00?"
Avoid this:
"Is the amount correct?" AI has no memory of what amount was entered.
AI Command - Perform a specific, discrete UI interaction
Write it like this:
"After the loading spinner disappears, click the 'Continue' button in the payment form."
Avoid this:
"Click Continue." Which Continue? What if it's not ready yet?
AI Task - Execute a complete multi-step business workflow
Write it like this:
"Transfer $500 from Savings to Checking, confirm the transaction and verify both balances are updated correctly."
Avoid this:
"Transfer money between accounts.", missing values, accounts, and success criteria.
AI Extract Data - Capture dynamic values for use in subsequent test steps
Write it like this:
"Create parameter ORDER_ID and assign the order number from the confirmation message on this page."
Avoid this:
"Get the order number.", stored where? from which element?
When you submit an intent-driven prompt to Harness AIT, it goes through a five-stage pipeline, and the quality of your prompt shapes every stage:
1. Interpret
The LLM Interface Layer reads your natural language prompt and formulates a structured test intent. Vague prompts produce ambiguous intents.
2. Explore
The AI queries its App Knowledgebase (Application Context) to find relevant pages and flows. Specific context in your prompt narrows this search dramatically.
3. Execute
Each step is translated into an executable action. Content-based references in your prompt produce resilient steps. Positional ones produce fragile ones.
4. Validate
DOM and screenshot-based validation confirms both functional and visual state. Your assertion prompts define exactly what gets checked.
5. Learn
Each run updates the App Knowledgebase. Better prompts produce richer, more accurate knowledge, improving future test case generation and reducing test maintenance.
These are the most common prompt antipatterns seen in AI-driven E2E testing, each one a reliable way to introduce flakiness:
Positional References
Saying 'click the third row' or 'select the first option' creates tests that break every time data changes or UI reorders.
Missing Context
Assertions like 'Is the amount correct?' fail because the AI might not have any memory of previous steps. Restate the expected values in assertions. Every prompt must be self-contained.
Compound Assertions
Checking multiple conditions in one assertion makes failures ambiguous. One assertion, one condition, always.
No Success Criteria
Tasks like 'register a new user' without specifying what success looks like leave the AI guessing when to stop.
Assumed Data Formats
Not specifying 'extract the total as a number without a currency symbol' means you might get '$1,234.56' when you needed '1234.56'.
Ignoring Timing
Not accounting for loading states ('after the spinner disappears') is one of the top causes of intermittent test failures.
End-to-end testing has always required precision. The medium has changed, from XPath selectors and coded steps to natural language testing instructions, but the requirement for precision hasn't. If anything, the stakes are higher because a poorly written prompt now fails invisibly: the AI will attempt something, just not what you intended.
The teams getting the most out of AI test automation are not the ones with the most sophisticated models. They're the ones who've learned to write clear, specific, self-contained instructions through effective prompt engineering. Who knows the difference between 'click the third button' and 'click the Submit button in the payment form.' Who ends every assertion with a question mark and every task with a success criterion.
Platforms like Harness AI Test Automation are built to reward exactly this kind of precision, turning well-crafted prompts into stable, self-healing tests that are CI/CD testing-ready and survive the real world with minimal test maintenance.
"The art of prompt engineering isn't about clever wording. It's about transferring your intent, completely and unambiguously, to an autonomous testing system that will act on every word you write."
Write with that precision, and your intelligent test automation will finally be the safety net it was always meant to be.
Harness AI Test Automation empowers teams to move faster with confidence. Key benefits include:
Harness AI Test Automation turns traditional QA challenges into opportunities for smarter, more reliable automation, enabling organizations to release software faster while maintaining high quality.
If you're ready to eliminate flaky tests, simplify maintenance, and improve test reliability with intent-driven, natural-language testing, try Harness AI Test Automation today or contact our team to see how it can transform your testing experience.


---
Key Takeaways
---
AI can generate code in seconds. It still can’t ship software safely.
That gap isn’t about model quality or prompt engineering. It’s about context, and most software organizations don’t have a system that accurately reflects how pipelines, services, environments, policies, and teams actually relate to each other.
Without that context, AI doesn’t automate delivery. It amplifies risk.
I am responsible for building the Knowledge Graph that powers Harness AI, and I see this every day, working on AI infrastructure and data platforms at Harness, and it’s a recurring theme. AI-first delivery fails not because of intelligence, but because of fragmentation.
Modern engineering organizations already generate more data than any human can reason about:
Each system works. The problem is that none of them agree on what the system actually is.
When something breaks, we don’t query systems. We page people. That’s the clearest signal you’ve hit the context bottleneck. When your organization depends on a few humans to resolve incidents, you don’t have a tooling problem. You have a context problem.
Most teams today operate in AI-assisted DevOps:
That’s helpful, but shallow.
AI-operational DevOps is different. Here, AI doesn’t just assist tasks. It understands how software actually moves from commit to production, including constraints, dependencies, and governance.
The difference is a platform problem. Without a shared context layer, AI remains a collection of point optimizations. With one, it becomes an operator.
Context is not dashboards. It’s not a data lake. And it’s definitely not another CMDB.
In practice, context means entities and relationships.
In DevSecOps environments, the most critical entities are:

Pipelines are often the natural center — not because they’re special, but because they express intent.
A pipeline alone isn’t context.
A pipeline links to:
That's the operational truth.
This is why knowledge graphs matter. They don’t store more data; they preserve meaning.
To truly transform the software development lifecycle, AI needs more than just intelligence, it needs deep, operational context. Harness AI uses a purpose-built Software Delivery Knowledge Graph to make AI fast, efficient, and exceptionally accurate. By bridging the gap between raw data and real-world delivery pipelines, we ensure that your AI operates with complete situational awareness from day one, allowing teams to ship faster without breaking things.
I’ve seen three failure modes repeat across organizations:

A knowledge graph only works when it’s use-case driven, minimal, and fresh.
The fastest way to see value is not breadth, it’s focus. Start with one use case that cannot be solved by a single system.
A strong starting point:
To support that, you need:
That’s often fewer than 10 entities. Everything else is enrichment, not day one requirements.
AI agents don’t need perfect context. They need the current context.
For delivery workflows, near real-time synchronization is often mandatory. When a deployment fails, an engineer doesn’t want last month’s answer; they want why it failed now. This is why the semantic layer matters. AI agents should interact with meaning, not raw tables.

AI agents must be treated as extensions of humans, not superusers.
That means:
At Harness, Policy as Code and native policy agents ensure AI can’t bypass governance — even when it’s acting autonomously.
You don’t measure a knowledge graph by node count. You measure it by outcomes.
Four metrics matter:
If context doesn’t improve decisions, it’s noise.
Imagine a developer says, in natural language: “Deploy this service to QA and production.”
Behind the scenes, an AI agent:
If the pipeline fails, the same graph enables automated remediation:
That’s not automation. That’s operational reasoning.
Traditional dashboards tell you what happened. Knowledge graphs tell you why.
Cost spikes only make sense when linked to:
Rollbacks are only safe when dependency graphs are understood. Rolling back a service without knowing the upstream and downstream impact is how outages cascade.
Do this:
Avoid this:
Context is a product, not a schema.
AI-first software delivery doesn’t fail because models aren’t smart enough. It fails because platforms don’t understand themselves.
Knowledge graphs give AI the one thing it can’t generate on its own: context grounded in reality, thus making them the primary pillar in AI-first software delivery context.
The future of software delivery isn't just automated; it's intelligently orchestrated. Because Harness AI uses a Software Delivery Knowledge Graph to make AI fast, efficient, and accurate, your teams can finally trust AI to handle complex operational workflows without adding risk. We’ve done the heavy lifting of mapping your operational truth so your AI can act with absolute precision.
What’s the difference between observability and a knowledge / context graph?
Observability shows what’s happening. Knowledge/Context graphs explain what it means.
Do knowledge graphs replace existing tools?
No. They connect them.
Who owns the knowledge graph?
Everyone: platform, SRE, security, and application teams.
Is this only for large enterprises?
No. Smaller teams benefit faster because tribal knowledge is thinner.
Can AI work without a knowledge graph?
Yes, but only at the task level, not the system level.


AI is changing both what you build and how you build it - at the same time. Today, Harness is announcing two new products to secure both: AI Security, a new product to discover, test, and protect AI running in your applications, and Secure AI Coding, a new capability of Harness SAST that secures the code your AI tools are writing. Together, they further extend Harness's DevSecOps platform into the age of AI, covering the full lifecycle from the first line of AI-generated code to the models running in production.
In November, Harness published our State of AI-Native Application Security report, a survey of hundreds of security and engineering leaders on how AI-native applications are changing your threat surface. The findings were stark: 61% of new applications are now AI-powered, yet most organizations lack the tools to discover what AI models and agents exist in their environments, test them for vulnerabilities unique to AI, or protect them at runtime. The attack surface has expanded dramatically — but the tools to defend it haven't kept up.
The picture is equally concerning on the development side. Our State of AI in Software Engineering report found that 63% of organizations are already using AI coding assistants - tools like Claude Code, Cursor, and Windsurf - to write code faster. But faster isn't safer. AI-generated code has the same vulnerabilities as human-written code, but now with larger and more frequent commits. AppSec programs that were already stretched thin are now breaking under the volume and velocity.
The result is a blind spot on both sides of the AI equation - what you're building, and what you're building with. Today, Harness is closing that gap.
Most security vendors are stuck in their lane. Shift-left tools catch vulnerabilities in code before they reach production. Runtime protection tools block attacks after applications are deployed. And the two rarely talk to each other.
Harness was built on a different premise: real DevSecOps means connecting every stage of the software delivery lifecycle, and closing the loop between what you find in production and what you fix in code.
That's what the Harness platform does today. Application Security Testing brings SAST and SCA directly into the development workflow, surfacing vulnerabilities where they're faster and cheaper to fix. SCS ensures the integrity of artifacts from build to deploy, while STO provides a unified view of security posture — along with policy and governance — across the entire organization.
As code ships to production, Web Application & API Protection monitors and defends applications and APIs in real time, detecting and blocking attacks as they happen. And critically, findings from runtime don't disappear into a security team's backlog — they flow back to developers to address root causes before the next release.
The result is a closed loop: find it in code, protect it in production, fix it fast. All on a single, unified platform.
Today, we're extending that loop into AI - on both sides. AI is reshaping what you build and how you build it simultaneously. A platform that can only address one side of that equation leaves you exposed on the other. Harness closes both gaps.
In the State of AI-Native Application Security, 66% of respondents said they are flying blind when it comes to securing AI-native apps. 72% call shadow AI a gaping chasm in their security posture. 63% believe AI-native applications are more vulnerable than traditional IT applications. They’re right to be concerned.
Harness AI Security is built on the foundation of our API security platform. Every LLM call, every MCP server, every AI agent communicating with an external service does so via APIs. Your AI attack surface isn't separate from your API attack surface; it's an expansion of it. AI threats introduce new vectors like prompt injection, model manipulation, and data poisoning on top of the API vulnerabilities your teams already contend with. There is no AI security without API security.
.png)
With the launch of AI Security, we are introducing AI Discovery in General Availability (GA). AI security starts where API security starts: discovery. You can't assess or mitigate risk from AI components you don't know exist. Harness already continuously monitors your environment for new API endpoints the moment they're deployed. Recognizing LLMs, MCP servers, AI agents, and third-party GenAI services like OpenAI and Anthropic is a natural extension of that. AI Discovery automatically inventories your entire AI attack surface in real time, including calls to external GenAI services that could expose sensitive data, and surfaces runtime risks, such as unauthenticated APIs calling LLMs, weak encryption, or regulated data flowing to external models.
Beyond discovering and inventorying your AI application components, we are also introducing AI Testing and AI Firewall in Beta, extending AI Security across the full discover-test-protect lifecycle.
.png)
AI Testing actively probes your LLMs, agents, and AI-powered APIs for vulnerabilities unique to AI-native applications, including prompt injection, jailbreaks, model manipulation, data leakage, and more. These aren't vulnerabilities that a traditional DAST tool is designed to find. AI Testing was purpose-built for AI threats, continuously validating that your models and the APIs that expose them behave safely under adversarial conditions. It integrates directly into your existing CI/CD pipelines, so AI-specific security testing becomes part of every release — not a one-time audit.
.png)
AI Firewall actively protects your AI applications from AI-specific threats, such as the OWASP Top 10 for LLM Applications. It inspects and filters LLM inputs and outputs in real time, blocking prompt injection attempts, preventing sensitive data exfiltration, and enforcing behavioral guardrails on your models and agents before an attack can succeed. Unlike traditional WAF rules that require manual tuning for every new threat pattern, AI Firewall understands AI-native attack vectors natively, adapting to the evolving tactics attackers use against generative AI.
Harness AI Security with AI Discovery is now available in GA, while AI Testing and AI Firewall are available in Beta.
"As AI-assisted development becomes standard practice, the security implications of AI-generated code are becoming a material blind spot for enterprises. IDC research indicates developers accept nearly 40% of AI-generated code without revision, which can allow insecure patterns to propagate as organizations increase code output faster than they expand validation and governance, widening the gap between development velocity and application risk."
— Katie Norton, Research Manager, DevSecOps, IDC
AI Security addresses the risks inside your AI-native applications. Secure AI Coding addresses a different problem: the vulnerabilities your AI tools are introducing into your codebase.
Developers are generating more code than ever, and shipping it faster than ever. AI coding assistants now contribute to the majority of new code at many organizations — and nearly half (48%) of security and engineering leaders are concerned about the vulnerabilities that come with it. AI-generated code arrives in larger commits, at higher frequency, and often with less review than human-written code would receive.
SAST tools catch vulnerabilities at the PR stage — but by then, AI-generated code has already been written, reviewed, and often partially shipped. Harness SAST's new Secure AI Coding capability moves the security check earlier to the moment of generation, integrating directly with AI coding tools like Cursor, Windsurf, and Claude Code to scan code as it appears in the IDE. Developers never leave their workflow. They see a vulnerability warning inline, alongside a prompt to send the flagged code back to the agent for remediation — all without switching tools or even needing to trigger a manual scan.
"Security shouldn't be an afterthought when using AI dev tools. Our collaboration with Harness kicks off vulnerability detection directly in the developer workflow, so all generated code is screened from the start." — Jeff Wang, CEO, Windsurf

What sets Secure AI Coding apart from simpler linting tools is what happens beneath the surface. Rather than pattern-matching the AI-generated code in isolation, it leverages Harness's Code Property Graph (CPG) to trace how data flows through the entire application - before, through, and after the AI-generated code in question. That means Secure AI Coding can surface complex vulnerabilities like injection flaws and insecure data handling that only become visible in the context of the broader codebase. The result is security that understands your application - not just the last thing an AI assistant wrote.
When we deployed AI across our own platform, our AI ecosystem grew faster than our visibility into it. We needed a way to track every API call, identify sensitive data exposure, and monitor calls to external vendors — including OpenAI, Vertex AI, and Anthropic — without slowing down our engineering teams.
Deploying AI Security turned that black box into a transparent, manageable environment. Some milestones from our last 90 days:
The shift wasn't just operational — it was cultural. We moved from reactive monitoring to proactive defense. As our team put it: "Securing AI is foundational for us. Because our own product runs on AI, it must be resilient and secure. We use our own AI Security tools to ensure that every innovation we ship is backed by the highest security standards."
AI is moving fast. Your attack surface is expanding in two directions at once - inside the applications you're building, and inside the code your teams are generating to build them.
Harness AI Security and Secure AI Coding are available now. Whether you're trying to get visibility into the AI running in your environment, test it for vulnerabilities before attackers do, or stop insecure AI-generated code from reaching production, Harness’ platform is ready.
Talk to your account team about AI Security. Get a live walkthrough of AI Discovery, AI Testing, and AI Firewall, and see how your AI attack surface maps against your existing API security posture.
Already a Harness CI customer? Start a free trial of Harness SAST - including Secure AI Coding. Connect it to your AI coding assistant, and see what's shipping in your AI-generated code today.


This is part 1 of a five-part series on building production-grade AI engineering systems.
Across this series, we will cover:
Most teams experimenting with AI coding agents focus on prompts.
That is the wrong starting point.
Before you optimize how an agent thinks, you must standardize what it sees.
AI agents do not primarily fail because of reasoning limits. They fail because of environmental ambiguity. They are dropped into repositories designed exclusively for humans and expected to infer structure, conventions, workflows, and constraints from scattered documentation.
If AI agents are contributors, then the repository itself must become agent-native.
The foundational step is introducing a standardized instruction layer that every agent can read.
That layer is AGENTS.md.
The Real Problem: Context Silos
Every coding agent needs instructions. Where those instructions live depends on the tool.
One IDE reads from a hidden rules directory.
Another expects a specific markdown file.
Another uses proprietary configuration.
This fragmentation creates three systemic problems.
1. Tool-dependent prompt locations
Instructions are locked into IDE-specific paths. Change tools and you lose institutional knowledge.
2. Tribal knowledge never gets committed
When a developer discovers the right way to guide an agent through a complex module, that guidance often lives in chat history. It never reaches version control. It never becomes part of the repository’s operational contract.
3. Inconsistent agent behavior
Two engineers working on the same codebase but using different agents receive different outputs because the instruction surfaces are different.
The repository stops being the single source of truth.
For human collaboration, we solved this decades ago with READMEs, contribution guides, and ownership files. For AI collaboration, we are only beginning to standardize.
What AGENTS.md Is
AGENTS.md is a simple, open, tool-agnostic format for providing coding agents with project-specific instructions. It is now part of the broader open agentic ecosystem under the Agentic AI Foundation, with broad industry adoption.
It is not a replacement for README.md. It is a complement.
Design principle:
Humans need quick starts, architecture summaries, and contribution policies.
Agents need deterministic build commands, exact test execution steps, linter requirements, directory boundaries, prohibited patterns, and explicit assumptions.
Separating these concerns provides:
Several major open source repositories have already adopted AGENTS.md. The pattern is spreading because it addresses a real structural gap.
Recent evaluations have also shown that explicit repository-level agent instructions outperform loosely defined “skills” systems in practical coding scenarios. The implication is clear. Context must be explicit, not implied.
A Real Example: OpenAI’s Agents SDK
A practical example of this pattern can be seen in the OpenAI Agents Python SDK repository.
The project contains a root-level AGENTS.md file that defines operational instructions for contributors and AI agents working on the codebase. You can view the full file here.
Instead of leaving workflows implicit, the repository encodes them directly into agent-readable instructions. For example, the file requires contributors to run verification checks before completing changes:
Run `$code-change-verification` before marking work complete.It also explicitly scopes where those rules apply, such as changes to core source code, tests, examples, or documentation within the repository.
Rather than expecting an agent to infer these processes from scattered documentation, the project defines them as explicit instructions inside the repository itself.
This is the core idea behind AGENTS.md.
Operational guidance that would normally live in prompts, chat history, or internal knowledge becomes version-controlled infrastructure.
Designing an Effective Root AGENTS.md
A root AGENTS.md should be concise. Under 300 lines is a good constraint. It should be structured, imperative, and operational.
A practical structure includes four required sections.
This section establishes the mental model.
Include:
Agents are pattern matchers. The clearer the structural map, the fewer incorrect assumptions they make.
This section must be precise.
Include:
Avoid vague language. Replace “run tests” with explicit commands.
Agents execute what they are told. Precision reduces drift.
This section defines conventions.
Rather than bloating AGENTS.md, reference a separate coding standards document for:
The root file should stay focused while linking to deeper guidance.
This is where most teams underinvest.
Document:
Agents tend to repeat statistically common patterns. Your codebase may intentionally diverge from those patterns. This section is where you enforce that divergence.
Think of this as defensive programming for AI collaboration.
Hierarchical AGENTS.md: Scaling Context Correctly
Large repositories require scoped context.
A single root file cannot encode all module-specific constraints without becoming noisy. The solution is hierarchical AGENTS.md files.
Structure example:
root/
AGENTS.md
module-a/
AGENTS.md
module-b/
AGENTS.md
sub-feature/
AGENTS.mdAgents automatically read nested AGENTS.md files when operating inside those directories. Context scales from general to specific.
Root defines global conventions.
Module-level files define local invariants.
Feature-level files encode edge-case constraints.
This reduces irrelevant context and increases precision.
It also mirrors how humans reason about codebases.
Compatibility Across Tools
A standard file location matters.
Some agents natively read AGENTS.md. Others require simple compatibility mechanisms such as symlinks that mirror AGENTS.md into tool-specific filenames.
The key idea is a single source of truth.
Do not maintain multiple divergent instruction files. Normalize on AGENTS.md and bridge outward if needed.
The goal is repository-level portability. Change tools without losing institutional knowledge.
Best Practices for Agent Instructions
To make AGENTS.md effective, follow these constraints.
Write imperatively.
Use direct commands. Avoid narrative descriptions.
Avoid redundancy.
Do not duplicate README content. Reference it.
Keep it operational.
Focus on what the agent must do, not why the project exists.
Update it as the code evolves.
If the build process changes, AGENTS.md must change.
Treat violations as signal.
If agents consistently ignore documented rules, either the instruction is unclear or the file is too long and context is being truncated. Reset sessions and re-anchor.
AGENTS.md is not static documentation. It is part of the execution surface.
Ownership and Governance
If agents are contributors, then their instruction layer requires ownership.
Each module-level AGENTS.md should be maintained by the same engineers responsible for that module. Changes to these files should follow the same review rigor as code changes.
Instruction drift is as dangerous as code drift.
Version-controlled agent guidance becomes part of your engineering contract.
Why Teams Are Adopting AGENTS.md
Repositories across the industry have begun implementing AGENTS.md as a first-class artifact. Large infrastructure projects, developer tools, SDKs, and platform teams are standardizing on this pattern.
The motivation is consistent:
AGENTS.md transforms prompt engineering from a personal habit into a shared, reviewable, versioned discipline.
Vercel published evaluation results showing that repository-level AGENTS.md context outperformed tool-specific skills in agent benchmarks.
Why This Matters Now
AI agents are rapidly becoming embedded in daily development workflows.
Without a standardized instruction layer:
The repository must become the stable contract between humans and machines.
AGENTS.md is the first structural step toward that contract.
It shifts agent collaboration from ad hoc prompting to engineered context.
Foundation Before Optimization
In the next post, we will examine a different failure mode.
Even with a perfectly structured AGENTS.md, long AI sessions degrade. Context accumulates. Signal dilutes. Hallucinations increase. Performance drops as token counts rise.
This phenomenon is often invisible until it causes subtle architectural damage.
Part 2 will focus on defeating context rot and enforcing session discipline using structured planning, checkpoints, and meta-prompting.
Before you scale orchestration.
Before you add subagents.
Before you optimize cost across multiple model providers.
You must first stabilize the environment.
An agent-native repository is the foundation.
Everything else builds on top of it.


AI is proliferating across enterprise environments faster than security teams can govern it. From third-party LLM integrations to agentic frameworks like Model Context Protocol (MCP), most organisations have limited visibility into how many AI systems are running, what data they process, or what risks they introduce.
Three realities are driving this to the top of the security agenda:
Example: Shadow AI in a financial services firm
A quantitative analyst team integrates an LLM into their research workflow. The integration ships as a product feature. Six months later, a compliance review finds the endpoint is externally accessible, processes client PII, and transmits data to a third-party model provider outside the scope of the firm's data processing agreements. The AI system existed, processed regulated data, and created regulatory exposure - entirely outside the security programme's awareness.
Effective AI security is not a single capability - it is a continuous workflow across four phases:
Harness continuously discovers and classifies every AI asset from live traffic and API specifications - no manual registration required:
Shadow AI found by Harness is risk-scored, ownership-flagged, and surfaced for immediate security review. The finding moves directly into the vulnerability lifecycle with a URL, environment classification, and traffic record.
Harness continuously analyzes AI API & MCP traffic to identify sensitive data types flowing through every discovered endpoint:
When sensitive data appears in an AI endpoint for the first time, or is transmitted to an external provider, Harness surfaces a real-time Posture Event - giving privacy and compliance teams the window to act before an exposure becomes a breach notification obligation.
Harness detects AI API & MCP tool vulnerabilities passively from live traffic - no active scanning, no disruption to production AI workloads. Detection covers:
Risk scoring applies AI-specific weighting: an unauthenticated, externally exposed LLM endpoint is simultaneously a prompt injection target, a data extraction vector, and a compute abuse surface. Scores are dynamic, recalculating as traffic patterns and sensitive data classifications change.
Harness Posture Events feed connects AI security signals to the workflows security teams already run:
Custom notifications: privacy teams can alert on sensitive data to 3rd parties; SOC on risk score spikes; governance on new shadow AI assets

AI security posture management is a journey, not a deployment. Here is how organisations evolve:
For organisations where CMDB governs asset lifecycle, Harness’s Service Graph Connector extends AI-SPM into ServiceNow. Key use cases:

Operationalising AI security is not about scanning prompts. It is about continuously discovering AI systems, understanding how they access sensitive data, assessing the risks they introduce, and integrating AI posture into the security operations that already exist.
The organisations that build this capability now will govern what others are still trying to find, detect exposures before they become incidents, and answer regulatory questions with data rather than approximation - continuously, not periodically.


Over the last few years, something fundamental has changed in software development.
If the early 2020s were about adopting AI coding assistants, the next phase is about what happens after those tools accelerate development. Teams are producing code faster than ever. But what I’m hearing from engineering leaders is a different question:
What’s going to break next?
That question is exactly what led us to commission our latest research, State of DevOps Modernization 2026. The results reveal a pattern that many practitioners already sense intuitively: faster code generation is exposing weaknesses across the rest of the software delivery lifecycle.
In other words, AI is multiplying development velocity, but it’s also revealing the limits of the systems we built to ship that code safely.
One of the most striking findings in the research is something we’ve started calling the AI Velocity Paradox - a term we coined in our 2025 State of Software Engineering Report.
Teams using AI coding tools most heavily are shipping code significantly faster. In fact, 45% of developers who use AI coding tools multiple times per day deploy to production daily or faster, compared to 32% of daily users and just 15% of weekly users.
At first glance, that sounds like a huge success story. Faster iteration cycles are exactly what modern software teams want.
But the data tells a more complicated story.
Among those same heavy AI users:
What this tells me is simple: AI is speeding up the front of the delivery pipeline, but the rest of the system isn’t scaling with it. It’s like we are running trains faster than the tracks they are built for. Friction builds, the ride is bumpy, and it seems we could be on the edge of disaster.

The result is friction downstream, more incidents, more manual work, and more operational stress on engineering teams.
To understand why this is happening, you have to step back and look at how most DevOps systems actually evolved.
Over the past 15 years, delivery pipelines have grown incrementally. Teams added tools to solve specific problems: CI servers, artifact repositories, security scanners, deployment automation, and feature management. Each step made sense at the time.
But the overall system was rarely designed as a coherent whole.
In many organizations today, quality gates, verification steps, and incident recovery still rely heavily on human coordination and manual work. In fact, 77% say teams often have to wait on other teams for routine delivery tasks.
That model worked when release cycles were slower.
It doesn’t work as well when AI dramatically increases the number of code changes moving through the system.
Think of it this way: If AI doubles the number of changes engineers can produce, your pipelines must either:
Otherwise, the system begins to crack under pressure. The burden often falls directly on developers to help deploy services safely, certify compliance checks, and keep rollouts continuously progressing. When failures happen, they have to jump in and remediate at whatever hour.
These manual tasks, naturally, inhibit innovation and cause developer burnout. That’s exactly what the research shows.
Across respondents, developers report spending roughly 36% of their time on repetitive manual tasks like chasing approvals, rerunning failed jobs, or copy-pasting configuration.
As delivery speed increases, the operational load increases. That burden often falls directly on developers.
The good news is that this problem isn’t mysterious. It’s a systems problem. And systems problems can be solved.
From our experience working with engineering organizations, we've identified a few principles that consistently help teams scale AI-driven development safely.
When every team builds pipelines differently, scaling delivery becomes difficult.
Standardized templates (or “golden paths”) make it easier to deploy services safely and consistently. They also dramatically reduce the cognitive load for developers.
Speed only works when feedback is fast.
Automating security, compliance, and quality checks earlier in the lifecycle ensures problems are caught before they reach production. That keeps pipelines moving without sacrificing safety.
Feature flags, automated rollbacks, and progressive rollouts allow teams to decouple deployment from release. That flexibility reduces the blast radius of new changes and makes experimentation safer.
It also allows teams to move faster without increasing production risk.
Automation alone doesn’t solve the problem. What matters is creating a feedback loop: deploy → observe → measure → iterate.
When teams can measure the real-world impact of changes, they can learn faster and improve continuously.
AI is already changing how software gets written. The next challenge is changing how software gets delivered.
Coding assistants have increased development teams' capacity to innovate. But to capture the full benefit, the delivery systems behind them must evolve as well.
The organizations that succeed in this new environment will be the ones that treat software delivery as a coherent system, not just a collection of tools.
Because the real goal isn’t just writing code faster. It’s learning faster, delivering safer, and turning engineering velocity into better outcomes for the business.
And that requires modernizing the entire pipeline, not just the part where code is written.


Harness Artifact Registry marks an important milestone as it evolves from universal artifact management into an active control point for the software supply chain. With growing enterprise adoption and new security and governance capabilities, Artifact Registry is helping teams block risky dependencies before they reach the pipeline, reduce supply chain exposure, and scale artifact management without slowing developers down.
In little over a year, Harness Artifact Registry has grown from early discovery to strong enterprise adoption, supporting a wide range of artifact formats, enterprise-scale storage, and high-throughput CI/CD pipelines across both customers and internal teams. What started as a focused initiative inside Harness has evolved into a startup within a startup, quickly becoming a core pillar of the Harness platform.
Today, we’re sharing how Artifact Registry is helping organizations scale software delivery by simplifying artifact management, strengthening supply chain security, and improving developer experience and where we’re headed next.
In customer conversations, one theme came up repeatedly: as organizations scale CI/CD, artifacts multiply fast. Containers, packages, binaries, Helm charts, and more end up spreading across fragmented tools with inconsistent controls. Teams don't want just another registry. They want one trusted system, deeply integrated with CI/CD, that can scale globally and enforce policy by default. That's exactly what the Artifact Registry was built to be. By embedding artifact management directly into the Harness platform, it reduces tooling sprawl while giving platform engineering, DevOps, and AppSec teams centralized visibility and control, without slowing developers down.
Today, Artifact Registry supports a growing ecosystem of artifact types, with Docker, Helm (OCI), Generic, Python, npm, Go, NuGet, Dart, Conda, PHP Composer, and AI artifacts now available, and more on the way. With Artifact Registry, teams can:
The business impact is already clear. Artifact Registry has quickly gained traction with several enterprise customers, driven by strong platform integration, low-friction adoption, and the advantage of having artifact management natively embedded within the CI/CD platform.

One early customer managing artifacts across Docker, Helm, Python, NPM, Go, and more has standardized on Harness Artifact Registry, achieving 100% CI adoption across teams and pipelines.
“Harness Artifact Registry is stable, performant, and easy to trust at scale, delivering faster and more reliable artifact pulls than our previous vendor”
— SRE Lead
By unifying artifact storage with the rest of the software delivery lifecycle, Artifact Registry simplifies operations while helping teams focus on shipping software.
Software supply chain threats have become both more frequent and more sophisticated. High-profile incidents like the SolarWinds breach, where attackers injected malicious code into trusted update binaries affecting thousands of organizations, exposed how deeply a compromised artifact can penetrate enterprise systems. More recently, the Shai-Hulud 2.0 campaign saw self-propagating malware compromise hundreds of npm packages and tens of thousands of downstream repositories, harvesting credentials and spreading automatically through development environments.
As these attacks show, risk doesn’t only exist after a build, it can be embedded long before artifacts reach CI/CD pipelines. That’s why Harness Artifact Registry was designed with governance at its core.
Harness Artifact Registry includes Dependency Firewall, a control point that allows organizations to govern which dependencies are allowed into their environment in the first place. Rather than relying on downstream scans after a package has already been pulled into CI/CD, Dependency Firewall evaluates dependency requests at ingest using policy-based controls.
This allows teams to proactively block risky artifacts before they are ever downloaded. Organizations can prevent the use of dependencies with known CVEs or license violations, blocking risky dependencies before they reach your pipeline, and restrict access to untrusted or unsafe upstream sources by default. The result is earlier risk reduction, fewer security exceptions later in the pipeline, and stronger alignment between AppSec and development teams without slowing delivery.
[Dependency Firewall Explainer Video]
To further strengthen supply chain protection, Artifact Registry provides built-in artifact quarantine, allowing organizations to automatically block artifacts that fail security or compliance checks. Quarantined artifacts cannot be downloaded or deployed until they meet defined policy requirements, helping teams stop risk before it moves downstream. All quarantine actions are policy-driven, fully auditable, and governed by RBAC, ensuring that only authorized users or systems can quarantine or release artifacts.

Rather than forcing teams to replace the tools they already use, Harness Artifact Registry is built to fit into real-world security workflows by unifying scanning and governance at the registry layer. Today, Artifact Registry includes built-in scanning powered by Aqua Trivy for vulnerabilities, license issues, and misconfigurations, and integrates with over 40 security scanners, including tools like Wiz, for container, SCA, and compliance checks. Teams can orchestrate these scans directly in their CI pipelines, with scan results feeding into policy evaluation to automatically determine whether an artifact is released or quarantined.

Artifact Registry also exposes APIs that allow external security and ASPM platforms to trigger quarantine or release actions based on centralized policy decisions. Together, these capabilities enable organizations to enforce consistent, policy-driven controls early, stop risky artifacts before they move downstream, and connect artifact governance to broader enterprise security workflows all without slowing down developers.
As organizations scaled, legacy registries have become bottlenecks disconnected from CI/CD, security, and governance workflows. Harness takes a different approach. Because Artifact Registry is natively integrated into the Harness platform, teams benefit from:
This tight integration has accelerated adoption by removing friction from day-to-day workflows. Teams are standardizing how artifacts are secured, distributed, and governed across the software delivery lifecycle, while keeping developer workflows fast and familiar.
Harness Artifact Registry was built to modernize artifact management for the enterprise, combining high-performance distribution with built-in security, governance, and visibility. We’ve continued to invest in a platform designed to scale with modern delivery pipelines and we’re just getting started.
Looking ahead, we’re expanding Artifact Registry in three key areas:
Support is coming for Alpine, Debian, Swift, RubyGems, Conan, and Terraform packages, enabling teams to standardize more of their software supply chain on a single platform.
We’re investing in artifact lifecycle management, immutability, audit logging, storage quota controls, and deeper integration with Harness Security Solutions.
Upcoming capabilities include semantic artifact discovery, custom dashboards, AI-powered chat, OSS gatekeeper agents, and deeper integration with Harness Internal Developer Portal.
Modern software delivery demands clear control over how software is built, secured, and distributed. As supply chain threats increase and delivery velocity accelerates, organizations need earlier visibility and enforcement without introducing new friction or operational complexity.
We invite you to sign up for a demo and see firsthand how Harness Artifact Registry delivers high-performance artifact distribution with built-in security and governance at scale.


An API failure is any response that doesn’t conform to the system’s expected behavior being invoked by the client. One example is when a client makes a request to an API that is supposed to return a list of users but returns an empty list (i.e., {}). A successful response must have a status code in the 200 series. An unsuccessful response must have either an HTTP error code or a 0 return value.
An API will raise an exception if it can’t process a client request correctly. The following are the common error codes and their meanings:
An API failure can happen because of issues with the endpoints like network connectivity, latency, and load balancing issues. The examples below may give you a good understanding of what causes an API failure.
Some APIs are better left locked down to those who need access and are only available to those using an approved key. However, when you don’t set up the correct permissions for users, you can impede the application’s basic functionality. If you’re using an external API, like Facebook, Twitter, or even Google Analytics, make sure you’re adding the permissions for your users to access the data they need. Also, keep on top of any newly added features that can increase security risks.
If you’re leveraging external APIs requiring extra configuration, get the correct API key so the app has the proper permissions. Also, provide your clients with API keys relevant to their authorization levels. Thus, your users will have the correct permissions and will seamlessly access your application.
We’ve all seen it happen a million times: someone discovers an API that’s exposed to everyone after gaining user consent. Until now, this was usually reasonably benign, but when credentials are leaked, things can get ugly fast, and companies lose brand trust. The biggest problem here is keeping admins from having unsecured access to sensitive data.
Using a secure key management system that includes the “View Keys” permission for the account will help mitigate this risk. For example, you could use AWS Key Management Service (AWS KMS) to help you manage and create your encryption keys. If you can’t protect your keys, then at the very least, include a strong master password that all users can access, and only give out these keys when needed.
Untrusted tokens and session variables can cause problems for how a website functions, causing timing issues with page loads and login calls or even creating a denial of service, which can harm the end-user experience and your brand.
The best way to secure sensitive data is by using token authentication, which will encode user data into the token itself based on time/date stamps. You can then enforce this to ensure that whenever you reissue tokens, they expire after a set amount of time or use them for API requests only. As for session variables, these are usually created based on your authentication keys and should be handled the same way as your privileged keys—with some encryption. And keep the source of your keys out of the hands of anyone who can access them.
If you’re using an API to power a website, you must upload new data in real time or save it to a cache for later use. When you set an expiry time for an API and fail to update, you make it unavailable. When a user or application tries to access it after the expiry, they get a 404 or 500 error.
You should use a middle ground option—a proxy API. This will allow you to cache your data before you make it available and only allow access to the correct bits of the APIs as needed. You should also schedule tasks that run daily to import updated data and bring it into your system.
This one isn’t necessarily a mistake, but it happens from time to time when developers aren’t careful about how they name things or if they’re using an improper URL structure for their API endpoints. When the URL structure is too complex or has invalid characters, you will get errors and failures. Look at some examples of bad URL structure: “http://example.com/api/v1?mode=get” The above structure is bad because the "?" character filters a single parameter, not the type of request. The default request type is GET; thus, a better URL would look like this: “http://example.com/api/v1”
Remove any unsafe characters in your URL, like angle brackets (<>). You use angle brackets as delimiters in your URL. Also, design the API to make it more friendly for users. For example, this URL "https://example.com/users/name" tells users they’re querying the names of users, unlike this URL "https://example.com/usr/nm" It’s also good practice to use a space after the “?” in your API URL because otherwise, people can mistakenly think that the space is part of a query string.
This happens when trying to build multiple ways of accessing multiple applications. You do this by relying on generic endpoints instead of target audiences and specific applications. Creating a lot of different paths for the same data results in non-intuitive routes.
There are several ways to go about this, but for most, you want to use a network proxy system that can handle the different data access methods and bring it all into one spot. This will help minimize potential issues with your APIs routes and help with user confusion and brand damage.
This can happen when organizations are not properly securing their public IP addresses, or there is no solid monitoring process. This exposes your assets by providing easy access to anyone. Exposed IPs make your application vulnerable to DDoS attacks and other forms of abuse or phishing.
Make sure you properly manage your IP addresses and have a solid monitoring system. You must block all IPv6 traffic and enforce strict firewall rules on your network. You should only allow service access through secure transport methods like TLS.
API errors are a plague on the internet. Sometimes they come as very poor performance that can produce long response times and bring down APIs, or they can be network-related and cause unavailable services. They’re often caused by problems such as inconsistent resource access errors, neglect in proper authentication checks, faulty authentication data validation on endpoints, failure to read return codes from an endpoint, etc. Once organizations recognize what causes API failures and how to mitigate them, they seek web application and API protection (WAAP) platforms to address the security gaps. Harness WAAP by Traceable helps you analyze and protect your application from risk and thus prevent failures.
Harness WAAP is the industry’s leading API security platform that identifies APIs, evaluates API risk posture, stops API attacks, and provides deep analytics for threat hunting and forensic research. With visual depictions of API paths at the core of its technology, its platform applies the power of distributed tracing and machine learning models for API security across the entire software development lifecycle. Book a demo today.


Argo CD is a Kubernetes-native continuous delivery controller that follows GitOps principles: Git is the source of truth, and Argo CD continuously reconciles what’s running in your cluster with what’s declared in Git.
That pull-based reconciliation loop is the real shift. Instead of pipelines pushing manifests into clusters, Argo CD runs inside the cluster and pulls the desired state from Git (or Helm registries) and syncs it to the cluster. The result is an auditable deployment model where drift is visible and rollbacks are often as simple as reverting a Git commit.
For enterprise teams, Argo CD becomes a shared platform infrastructure. And that changes what “install” means. Once Argo CD is a shared control plane, availability, access control, and upgrade safety matter as much as basic deployment correctness because failures impact every team relying on GitOps.
A basic install is “pods are running.” An enterprise install is:
Argo CD can be installed in two ways: as a “core” (headless) install for cluster admins who don’t need the UI/API server, or as a multi-tenant install, which is common for platform teams. Multi-tenant is the default for most enterprise DevOps teams that use GitOps with a lot of teams.
Before you start your Argo CD install, make sure the basics are in place. You can brute-force a proof of concept with broad permissions and port-forwarding. But if you’re building a shared service, doing a bit of prep up front saves weeks of rework.
If your team is in a regulated environment, align on these early:
Argo CD install choices aren’t about “works vs doesn’t work.” They’re about how you want to operate Argo CD a year from now.
Helm (recommended for enterprise):
Upstream manifests:
If your Argo CD instance is shared across teams, Helm usually wins because version pinning, values-driven configuration, and repeatable upgrades are easier to audit, roll back, and operate safely over time.
Enterprises often land in one of these models:
As a rule: start with one shared instance and use guardrails (RBAC + AppProjects) to keep teams apart. Add instances only when you really need to (for example, because of regulatory separation, disconnected environments, or blast-radius requirements).
When Argo CD is a shared dependency, high availability (HA) is important. If teams depend on Argo CD to deploy, having just one replica Argo CD server can slow things down and cause problems with pagers.
There are three common access patterns:
For most enterprise teams, the sweet spot is Ingress + TLS + SSO, with internal-only access unless your operating model demands external access.
If you’re building Argo CD as a shared service, Helm gives you the cleanest path to versioned, repeatable installs.
helm repo add argo https://argoproj.github.io/argo-helm
helm repo update
# Optional: list available versions so you can pin one
helm search repo argo/argo-cd --versions | head -n 10
In enterprise environments, “latest” isn’t a strategy. Pin a chart version so you can reproduce your install and upgrade intentionally.
kubectl create namespace argocd
Keeping Argo CD isolated in its own namespace simplifies RBAC, backup scope, and day-2 operations.
Start by pulling the chart’s defaults:
helm show values argo/argo-cd > values.yaml
Then make the minimum changes needed to match your access model. Many tutorials demonstrate NodePort because it’s easy, but most enterprises should standardize on Ingress + TLS.
Here’s a practical starting point (adjust hostnames, ingress class, and TLS secret to match your environment):
# values.yaml (example starter)
global:
domain: argocd.example.internal
configs:
params:
# Common when TLS is terminated at an ingress or load balancer.
server.insecure: "true"
server:
ingress:
enabled: true
ingressClassName: nginx
hosts:
- argocd.example.internal
tls:
- secretName: argocd-tls
hosts:
- argocd.example.internal
# Baseline resource requests to reduce noisy-neighbor issues.
controller:
resources:
requests:
cpu: 200m
memory: 512Mi
repoServer:
resources:
requests:
cpu: 200m
memory: 512Mi
This example focuses on access configuration and baseline resource isolation. In most enterprise environments, teams also explicitly manage RBAC policies, NetworkPolicies, and Redis high-availability decisions as part of the Argo CD platform configuration.
If your clusters can’t pull from public registries, you’ll need to mirror Argo CD and dependency images (Argo CD, Dex, Redis) into an internal registry and override chart values accordingly.
Use helm upgrade --install so your install and upgrade command is consistent.
helm upgrade --install argocd argo/argo-cd \
--namespace argocd \
--values values.yaml
Validate that core components are healthy:
kubectl get pods -n argocd
kubectl get svc -n argocd
kubectl get ingress -n argocd
If something is stuck, look at events:
kubectl get events -n argocd --sort-by=.lastTimestamp | tail -n 30
Most installs include these core components:
Knowing what each component does helps you troubleshoot quickly when teams start scaling usage.
Your goal is to get a clean first login and then move toward enterprise access (Ingress + TLS + SSO).
kubectl port-forward -n argocd svc/argocd-server 8080:443
Then open https://localhost:8080.
It’s common to see an SSL warning because Argo CD ships with a self-signed cert by default. For a quick validation, proceed. For enterprise usage, use real TLS via your ingress/load balancer.
Once DNS and TLS are wired:
If your ingress terminates TLS at the edge, running the Argo CD API server with TLS disabled behind it (for example, server.insecure: “true”) is a common pattern.
Default username is typically admin. Retrieve the password from the initial secret:
kubectl -n argocd get secret argocd-initial-admin-secret \
-o jsonpath="{.data.password}" | base64 --decode; echo
After you’ve logged in and set a real admin strategy using SSO and RBAC, the initial admin account should be treated as a break-glass mechanism only. Disable or tightly control its use, rotate credentials, and document when and how it is allowed.
If you want a quick Argo CD install for learning or validation, upstream manifests get you there fast.
Important context: the standard install.yaml manifest is designed for same-cluster deployments and includes cluster-level privileges. It’s also the non-HA install type that’s typically used for evaluation, not production. If you need a more locked-down footprint, Argo CD also provides namespace-scoped and HA manifest options in the upstream manifests.
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
Validate:
kubectl get pods -n argocd
kubectl get svc -n argocd
Then port-forward to access the UI:
kubectl port-forward -n argocd svc/argocd-server 8080:443
Use admin plus the password from argocd-initial-admin-secret as shown in the prior section.
For enterprise rollouts, treat manifest installs as a starting point. If you’re standardizing Argo CD across environments, Helm is easier to control and upgrade.
A real install isn’t “pods are running.” A real install is “we can deploy from Git safely.” This quick validation proves:
Keep it boring and repeatable. For example:
apps/
guestbook/
base/
overlays/
dev/
prod/
Or, if you deploy with Helm:
apps/
my-service/
chart/
values/
dev.yaml
prod.yaml
Even for a test app, start with the guardrail. AppProjects define what a team is allowed to deploy, and where.
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: team-sandbox
namespace: argocd
spec:
description: "Sandbox boundary for initial validation"
sourceRepos:
- "https://github.com/argoproj/argocd-example-apps.git"
destinations:
- namespace: sandbox
server: https://kubernetes.default.svc
namespaceResourceWhitelist:
- group: "apps"
kind: Deployment
- group: ""
kind: Service
- group: "networking.k8s.io"
kind: Ingress
Apply it:
kubectl apply -f appproject-sandbox.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: guestbook
namespace: argocd
spec:
project: team-sandbox
source:
repoURL: https://github.com/argoproj/argocd-example-apps.git
targetRevision: HEAD
path: guestbook
destination:
server: https://kubernetes.default.svc
namespace: sandbox
syncPolicy:
automated:
selfHeal: true
prune: false
syncOptions:
- CreateNamespace=true
Note: In many enterprise environments, namespace creation is restricted to platform workflows or Infrastructure as Code pipelines. If that applies to your organization, remove CreateNamespace=true and require namespaces to be provisioned separately.
Apply it:
kubectl apply -f application-guestbook.yaml
Now confirm:
By default, Argo CD polls repos periodically. Many teams configure webhooks (GitHub/GitLab) so Argo CD can refresh and sync quickly when changes land. It’s not required for day one, but it improves feedback loops in active repos.
This is where most enterprise rollouts either earn trust or lose it. If teams don’t trust the platform, they won’t onboard their workloads.
Focus on these enterprise minimums:
Practical rollout order:
Break-glass access should exist, but it should be documented, auditable, and rare.
Enterprise teams don’t struggle because they can’t install Argo CD. They struggle because Argo CD becomes a shared dependency—and shared dependencies need operational maturity.
At scale, pressure points are predictable:
Plan a path to HA before you onboard many teams. If HA Redis is part of your design, validate node capacity so workloads can spread across failure domains.
Keep monitoring simple and useful:
Also, decide alert ownership and escalation paths early. Platform teams typically own Argo CD availability and control-plane health, while application teams own application-level sync and runtime issues within their defined boundaries.
Git is the source of truth for desired state, but you still need to recover platform configuration quickly.
Backup:
Then run restore tests on a schedule. The goal isn’t perfection—it’s proving you can regain GitOps control safely.
A safe enterprise approach:
Avoid “random upgrades.” Treat Argo CD as platform infrastructure with controlled change management.
Argo CD works well on EKS, but enterprise teams often have extra constraints: private clusters, restricted egress, and standard AWS ingress patterns.
Common installation approaches on EKS:
For access, most EKS enterprise teams standardize on an ingress backed by AWS Load Balancer Controller (ALB) or NGINX, with TLS termination at the edge.
An enterprise-grade Argo CD install is less about getting a UI running and more about putting the right foundations in place: a repeatable deployment method (typically Helm), a stable endpoint for access and SSO, and clear boundaries so teams can move fast without stepping on each other. If you take away one thing, make it this: treat Argo CD like shared platform infrastructure, not a one-off tool.
Start with a pinned, values-driven Helm install. Then lock in the enterprise minimums: SSO, RBAC, and AppProjects, before you onboard your second team. Finally, operationalize it with monitoring, backups, and a staged upgrade process so Argo CD stays reliable as your cluster and application footprint grows.
When you need orchestration, approvals, and progressive delivery across complex releases, pair GitOps with Harness CD. Request a demo.
These are quick answers to the most common questions that business teams have when they install Argo CD.
Most enterprise teams should use Helm to install Argo CD because it lets you pin versions, keep configuration in Git, and upgrade in a predictable way. Upstream manifests are a great way to get started quickly if you’re thinking about Argo CD.
Use an internal hostname, end TLS at your ingress/load balancer, and make sure that SSO is required for interactive access. Do not make Argo CD public unless your business model really needs it.
Pin your chart/app versions, test upgrades in a non-production environment, and then move the same change to other environments. After the upgrade, check that you can log in, access the repo, and sync with a real app.
Use RBAC and AppProjects to set limits on a single shared instance. Only approved repos should be used by app teams to deploy to approved namespaces and clusters.
Back up the argocd namespace (ConfigMaps, Secrets, and CRs) and keep app definitions in Git. Run restore tests on a schedule so recovery steps are proven, not theoretical.


If you've worked with builds and deployments, then you already know how central Docker images, dependencies, and containers are to modern software delivery. The introduction of Docker revolutionised how we package and run software, while the Open Container Initiative (OCI) brought much-needed standardisation to container formats and distribution. Docker made containers mainstream; OCI made them universal.
Even though Docker Hub and private registries have served us well, they often introduce challenges at scale:
And even after every dependency and sanity check passes, one question remains:
How effectively can you integrate and deploy artifacts through your CI/CD supply chain, without risking credential leaks or losing end-to-end visibility?
This is exactly where Harness Artifact Registry comes in.
Harness Artifact Registry is a cloud-native, secure artifact storage and management platform built for the future. Unlike traditional Docker registries or basic container registries, it's designed not just to store your Docker images and artifacts but also to actively secure and govern them. It's fully OCI-compliant, supporting Docker containers and other container formats natively, whilst integrating directly with CI/CD pipelines, policy engines, and vulnerability scanners.
Let me walk you through the complete journey of how an artifact moves through Harness Artifact Registry, from the moment you build it to when it's deployed in production.

Docker Registry Client Setup
It all begins with the very first step after you build your Docker image on your system: storing it in a secure artifact storage layer through your container registry. Harness Artifact Registry supports more than 16 registry types and is fully OCI-compliant. You can simply use Docker to push the artifacts into the registry or even use the Harness CLI for it.
It is as simple as pushing to Docker Hub. Once you've authenticated with your Harness Artifact Registry, you can use standard Docker commands to push Docker images:
# Step 1: Tag the existing image (using its digest) with a new tag
docker tag <REGISTRY_URL>/<REPOSITORY>/<IMAGE_NAME>@<DIGEST> <REGISTRY_URL>/<REPOSITORY>/<IMAGE_NAME>:<NEW_TAG>
# Step 2: Push the newly tagged image to the registry
docker push <REGISTRY_URL>/<REPOSITORY>/<IMAGE_NAME>:<NEW_TAG>
Because Harness Artifact Registry is fully OCI-compliant, it works seamlessly with any OCI-compatible client. This means you don't need to learn new tools or change your existing Docker workflows. Whether you're migrating from Google Artifact Registry, Azure Container Registry, AWS ECR, or Docker Hub, the experience remains consistent.
We understand that a build requires many dependencies and versioning, with some even pulling from open-source repositories. These sources can vary significantly for enterprises. That's why we've made it easy to integrate custom registries so you can cache artifacts via a proxy.
Harness Artifact Registry allows you to configure upstream registries as remote repositories. This means you can:
Apart from Docker Hub, Google Artifact Registry, and AWS ECR, you can set up custom registries with just a Remote Registry URL and basic authentication using a username and password. This proxy capability ensures that even when your teams pull Docker images from public registries, everything flows through Harness Artifact Registry first, giving you complete visibility, governance, and unified artifact storage control.
This is where Harness Artifact Registry truly shines. Rather than treating security as an afterthought, it's baked into every layer of the artifact lifecycle.
Container vulnerability scanners detect security issues in your Docker images and container images before they can cause problems. Harness Artifact Registry integrates with industry-leading scanners like Aqua Trivy and Snyk, allowing you to automatically scan every artifact that enters your registry.
Here's what makes this powerful: when a Docker image is pushed, Harness automatically triggers a security pipeline that scans the artifact and generates a complete Software Bill of Materials (SBOM) along with detailed vulnerability reports. You get immediate visibility into:
The SBOM and vulnerability details are displayed directly in the Harness interface, giving you complete transparency into what's inside your containers and their security posture. This level of container security goes beyond what traditional Docker registries offer.
When you're pulling dependencies from external sources through the upstream proxy, the Dependency Firewall actively blocks risky or unapproved packages before they even enter your registry. You can configure it to either block suspicious dependencies outright or set it to warn mode for your team to review. This means malicious dependencies are stopped at the gate, not discovered later in your pipeline.
Beyond vulnerability scanning, you can assign policy sets to be evaluated against each artifact. These policies act as automated gatekeepers, enforcing your organisation's security and compliance requirements.
For example, you might create policies that:
Policies are evaluated automatically, and non-compliant artifacts can be quarantined or blocked entirely.
When an artifact fails a security scan or violates a policy, it can be automatically quarantined. This prevents it from being used in deployments whilst still allowing your team to investigate and remediate the issues. This proactive approach significantly reduces your attack surface and ensures only verified artifacts make it to production.
Your artifact is now ready, fully scanned for vulnerabilities, and stored securely in your container registry. This is where everything comes together for developers and platform engineers alike. The seamless integration between Harness Artifact Registry and Harness CI/CD pipelines means you can build Docker images, store artifacts, and deploy without context switching or managing complex credentials across multiple registry systems.

Harness CI is all about getting your code built, tested, and packaged efficiently. Harness Artifact Registry fits naturally into this workflow by providing native steps that eliminate the complexity of managing Docker registry credentials and connections.
Build and Push to Docker: This native CI step allows you to build your Docker images and push them directly to Harness Artifact Registry without any external connectors. The platform handles Docker registry authentication automatically, so you can focus on your build logic rather than credential management.
Upload artifacts: Beyond Docker images, you can publish Maven artifacts, npm packages, Helm charts, or generic files directly to Harness Artifact Registry. This unified artifact management approach means all your build outputs live in one place, with consistent vulnerability scanning and policy enforcement across every artifact type.
The essence here is simplicity: your CI pipeline produces artifacts and Docker containers, and they're automatically stored, scanned, and made available for deployment, all within the same platform.
Every deployment needs an artifact. Whether you're deploying Docker containers to Kubernetes, AWS ECS, Google Cloud Run, or traditional VMs, your deployment pipeline needs to know which version of your application to deploy and where to get it from.
This is where Harness Artifact Registry becomes invaluable. Because it's natively integrated with Harness CD, your deployment pipelines can pull Docker images and artifacts directly without managing external Docker registry credentials or complex authentication flows.
Harness CD supports numerous deployment types (often called CD swimlanes), and Harness Artifact Registry works seamlessly with all of them. When you configure a CD service, you simply select Harness Artifact Registry as your artifact source, specify which container registry and artifact to use, and define your version selection criteria.
From there, the deployment pipeline handles everything: authenticating with the registry, pulling the correct Docker image version, verifying it's passed vulnerability scans and security checks, and deploying it to your target environment. You can deploy to production with strict version pinning for stability, or to non-production environments with dynamic version selection for testing. The choice is yours, and it's all configured through the same intuitive interface.
The real power lies in the traceability. Every deployment is logged with complete details: which artifact version was deployed, when, by whom, and to which environment. If you need to roll back, the previous Docker image versions are right there, ready to be redeployed.
From the moment you build a Docker image to when it's running in production, Harness Artifact Registry provides a complete, secure, and governed artifact lifecycle. You get container security that prevents issues before they occur, complete visibility through SBOM generation and audit logs, and native CI/CD integration that eliminates the complexity of managing multiple Docker registries and credentials.
This isn't just about storing Docker images. It's about building confidence in your software supply chain with a secure, OCI-compliant container registry.
In a world where supply chain attacks are increasingly common and compliance requirements continue to grow, having a robust artifact management and container registry strategy is essential. Harness Artifact Registry delivers that strategy through a platform that's both powerful and intuitive.
Whether you're a developer pushing your first Docker image, a platform engineer managing deployment pipelines, or a security professional ensuring compliance, Harness Artifact Registry provides the tools you need to move fast without compromising on security.
Ready to experience a fully OCI-compliant Docker registry with built-in vulnerability scanning, dependency firewall, and seamless CI/CD integration? Explore Harness Artifact Registry and see how it transforms your software delivery pipeline with secure artifact management.


Database systems store some of the most sensitive data of an organization such as PII, financial records, and intellectual property, making strong database governance non-negotiable. As regulations tighten and audit expectations increase, teams need governance that scales without slowing delivery.
Harness Database DevOps addresses this by applying policy-driven governance using Open Policy Agent (OPA). With OPA policies embedded directly into database pipelines, teams can automatically enforce rules, capture audit trails, and stay aligned with compliance requirements. This blog outlines how to use OPA in Harness to turn database compliance from a manual checkpoint into a built-in, scalable part of your DevOps workflow.
Organizations face multiple challenges when navigating database compliance:
These challenges highlight the necessity of embedding governance directly into database development and deployment pipelines, rather than treating compliance as a reactive checklist.
Harness Database DevOps is designed to offer a comprehensive solution to database governance - one that aligns automation with compliance needs. It enables teams to adopt policy-driven controls on database change workflows by integrating the Open Policy Agent (OPA) engine into the core of database DevOps practices.
What is OPA and Policy as Code?
Open Policy Agent (OPA) is an open-source, general-purpose policy engine that decouples policy decisions from enforcement logic, enabling centralized governance across infrastructures and workflows. Policies in OPA are written in the Rego declarative language, allowing precise expression of rules governing actions, access, and configurations.
Harness implements Policy as Code through OPA, enabling teams to store, test, and enforce governance rules directly within the database DevOps lifecycle. This model ensures that compliance controls are consistent, auditable, and automatically evaluated before changes reach production.
Here’s a structured approach to implementing database governance with OPA in Harness:
Start by cataloging your regulatory obligations and internal governance policies. Examples include:
Translate these requirements into quantifiable rules that can be expressed in Rego.
Within the Harness Policy Editor, define OPA policies that codify governance rules. For example, a policy might block any migrations containing operations that remove columns in production environments without explicit DBA approval.
Harness policies are modular and reusable, you can import and extend them as part of broader governance packages. This allows cross-team reuse and centralized management of rules. Key aspects include:
By expressing governance as code, you ensure consistency and remove ambiguity in policy enforcement.
Policies can be linked to specific triggers within your database deployment workflow, for instance, evaluating rules before a migration is applied or before a pipeline advances to production. This integration ensures that non-compliant changes are automatically blocked, while compliant changes proceed seamlessly, maintaining the balance between speed and control.
Harness evaluates OPA policies at defined decision points in your pipeline, such as pre-deployment checks. This prevents risky actions, enforces access controls, and aligns every deployment with governance objectives without manual intervention.
Audit Trails and Traceability
Every policy evaluation is logged, creating an auditable trail of who changed what, when, and why. These logs serve as critical evidence during compliance audits or internal reviews, reducing the overhead and risk associated with traditional documentation practices.
By enforcing the principle of least privilege, policies ensure that users and applications possess only the necessary permissions for their specific roles. This restriction on access is crucial for minimizing the potential attack surface and maintaining compliance with regulatory requirements for data access governance.
Database governance is an essential pillar of enterprise compliance strategies. By embedding OPA-based policy enforcement within Harness Database DevOps, organizations can automate compliance controls, minimize risk, and maintain developer productivity. Policy as Code provides a scalable, auditable, and consistent framework that aligns with both regulatory obligations and the need for agile delivery.
Transforming database governance from a manual compliance burden into an automated, integrated practice empowers teams to innovate securely, confidently, and at scale - ensuring that every change respects the policies that protect your data, your customers, and your brand.
Need more info? Contact Sales