The Agent Harness

Mar 31, 2026

The Agent is the Harness

An agent harness is the runtime system that turns a model into an operating agent, but the harness required for a coding or personal agent is fundamentally different from the one required for an enterprise agent. Coding harnesses are built for workspace execution; enterprise harnesses are built for governed participation in long-running business processes under identity, policy, and trust.

Introduction

Agent harness design has moved to the center of agent engineering. The shift became harder to ignore after the capability jump in late 2025, when stronger models made clear that production performance was no longer explained by model quality alone. The surrounding runtime now determines whether an agent can sustain continuity, use tools under control, assemble the right context, recover from failure, and produce evidence that its work should be accepted.

The problem is that much of the current harness discussion still comes from coding agents and personal agents. That work has value, but it carries structural assumptions: a visible user, a bounded session, a local workspace, broad tool access, and a task centered on files or research output. Teams building enterprise agents often extend that pattern instead of recognizing that the operating model itself has changed.

The central claim of this article is that an enterprise harness is not a larger coding harness. It is a different architectural form: a governed execution layer for institutional work. That form can be understood through three dependent layers. At the base is security, identity, and trust. Above that sits the execution and control plane. At the top sits the operating model that defines how the agent participates in work. The top layer depends on the middle, and the middle depends on the base.

This article defines the harness, introduces that three-layer model, summarizes the broader contrast between coding and enterprise harnesses, and then focuses on four dimensions where the coding pattern leads to the wrong architecture rather than merely an incomplete one. Those four are identity, authorization, state management, and failure handling.

What an agent harness is

An agent harness is the engineered runtime system that surrounds a model and makes the agent operational over time. It is the machinery that lets the agent receive work, assemble context, access tools, persist state, act within boundaries, recover from failure, and produce evidence that its work was valid. In practical terms, the harness determines what the agent can see, what it can do, what it remembers, and what conditions must be satisfied before its work is accepted.

The harness is more than a prompt, more than a tool list, and more than a wrapper script. In small systems it may include prompts, memory files, a filesystem, a browser, a shell, tests, and a progress log. In larger systems it expands into identity, authorization, orchestration, observability, policy enforcement, workflow state, audit evidence, and service-level reliability. The model still matters, but the harness decides whether that model can be trusted in execution.

A general harness can be described component groups: context and state management; action and environment access; control and recovery; and verification and evidence. Together they determine how the agent receives information, acts on the world, persists continuity, and proves that its work is acceptable.

That definition applies to both coding agents and enterprise agents. What changes is the weight and design of each component. A coding harness treats the workspace as the center of gravity. An enterprise harness treats the need for enterprise-grade capabilities as the primary focus. That change in focus is what changes the architecture.

Three Layers of an Enterprise Harness

The first layer is security, identity, and trust. This is the foundation. An enterprise agent must have a stable identity, explicit credentials, lifecycle state, revocation paths, and bounded authorization. Its trust posture cannot be implied by the fact that it is useful. It has to be grounded in evidence, governance, and control. Without that layer, there is no reliable way to know which agent acted, which permissions it held, or whether it should continue to be allowed to operate.

The second layer is the execution and control plane. This is the runtime machinery that lets the agent do work under control. It includes tool access, state management, memory artifacts, orchestration, failure handling, reliability, and scalability. Every action the agent takes passes through this layer, and every action is constrained by the layer below it. Tool access, for example, is never just a convenience feature in an enterprise harness. It is a controlled action surface whose boundaries depend on identity and authorization.

The third layer is the operating model. This defines how the agent fits into institutional work. It includes the agent’s purpose, its unit of work, its human relationship, runtime environment, context sources, conversation model, discoverability, observability, verification, and explainability. This layer describes what the agent is doing in the enterprise and how the enterprise expects to interact with it. It only works if the layers below it can sustain it.

The dependencies matter. A harness that gets the operating model right but lacks strong identity and authorization is ungovernable. A harness with strong security but weak state management cannot sustain long-running work. A harness with good tooling but no structured failure handling cannot safely participate in real processes. An enterprise harness is therefore a systems design problem. The layers have to be designed together.

Coding and personal harnesses also have versions of these layers, but they are lighter, often implicit, and organized around a different center of gravity. They stabilize a workstation-shaped environment. Enterprise harnesses stabilize an institutional environment. That is the shift the rest of the article examines.

Summary Comparison

A diagram of security and identity

AI-generated content may be incorrect.

A blue and white text on a white background

AI-generated content may be incorrect.

A diagram of a company's operating model

AI-generated content may be incorrect.

Most of these are differences of degree. A coding harness handles state; an enterprise harness handles more state for longer periods under stricter durability requirements. A coding harness supports verification; an enterprise harness verifies against broader standards. Those are meaningful shifts, but they can be extended incrementally. The deeper issue appears where the workstation assumption itself stops holding. The test is whether extending the coding pattern produces an architecture that is merely incomplete or one that is structurally wrong for the operating context. The next four sections focus on those cases.

Identity: session-scoped runtime versus persistent principal

The decisive question is simple: is the agent a runtime convenience or a recognized enterprise principal? Coding harnesses usually choose the first answer. The user knows which tool is running, the repo knows which session made a change, and that is often enough. The harness may distinguish between worktrees, sessions, or runtime instances, but it does not need to treat the agent as a durable institutional principal because the human sponsor remains close to the task.

Enterprise settings break that assumption. An agent that handles first-notice-of-loss intake for auto claims may read claim records from a claims administration platform, pull police reports from a document repository, consult a coverage policy service, and update a customer workflow queue. If identity remains session-scoped in that environment, the organization loses the ability to answer basic governance questions. Which agent acted? Which approved version was deployed? What credential scope did it hold? Under whose authority did it recommend denial, escalation, or payment review?

Extending the coding pattern produces a system that can act but cannot be governed. Actions may be logged, but the logs do not point to a stable enterprise principal. Permissions may exist, but they are attached to transient runtime contexts rather than to a durable identity. Discoverability weakens because the organization cannot publish a meaningful description of an agent that has no stable institutional form. Accountability weakens for the same reason: remediation has no anchor.

An enterprise harness has to assign stable identity at the foundation. The agent needs ownership, lifecycle state, credentials, policy bindings, and revocation paths. That identity has to persist across sessions and deployments so the enterprise can tie actions to an approved principal rather than to an ephemeral execution. Identity is what lets every higher-level control work. Authorization depends on it. Observability depends on it. Trust depends on it.

In the claims example, a disputed payout six months later should not send investigators searching through runtime logs that refer only to containers or sessions. They should be able to identify the specific claims-triage agent, its version, its approval status at the time, the credential scope it held, and the policy set attached to it. Under a persistent identity model, that is possible. Under a session-scoped model, the organization can shut down a runtime, but it cannot manage a principal. That is the difference between an agent that happens to exist in production and one that can be governed as an enterprise actor.

Policy and authorization: broad access versus zero-trust control

Broad local access is useful in a workstation. That is why coding harnesses commonly expose bash, filesystem access, browser control, code execution, and nearby tools. The design pressure favors speed and convenience. The harness is trying to make the workspace legible and actionable with minimal friction.

That model becomes structurally wrong when the agent is no longer acting inside a bounded local workspace. Enterprise agents touch production systems, customer records, financial ledgers, regulated documents, and approval steps. In that setting, broad ambient access creates an agent with too much latent authority, often inherited from a surrounding runtime or shared service account, without evaluating whether the current action is appropriate in the current context.

A coding-style extension usually puts authorization at the perimeter rather than at the action boundary. The agent can call a system because the harness made that capability available once, not because this particular action is permitted now. That is a serious mismatch for enterprise work, where legitimacy changes by case, role, business purpose, policy condition, and workflow state. An action that is valid during intake may be invalid after an exception has been raised or after a required approval has not yet been granted. A system-level grant cannot express that.

An enterprise harness has to enforce least privilege and per-action policy evaluation. Every request to use a tool, call an API, read a dataset, write a record, or delegate to another agent has to be checked against identity, role, task, business purpose, system boundary, and policy state. The harness has to know not only whether the agent can ever use a tool, but whether it can use it here, now, for this case, and for this reason. This is where zero trust becomes a runtime property rather than a slogan.

Consider a remediation agent working a consumer-bank dispute case. It needs access to card transaction history in the ledger system, customer profile data in the CRM, and complaint notes in the case platform. In a coding-style harness, the API credentials for all three systems may simply be present whenever the agent runs. The agent can continue using them even after the case reaches a state where Regulation E review requires human approval before customer-facing action. In a zero-trust harness, access is scoped to the current case, step, role, and policy condition. If the workflow crosses an approval boundary, the harness denies the action or routes it to approval before execution. That moves policy from observation to control.

State management: local artifacts versus durable process state

A multi-step approval workflow exposes the difference quickly. Suppose an enterprise agent assembles vendor onboarding documents, routes them for sanctions review, waits for legal approval, and then resumes to complete supplier setup. If its continuity depends on a local progress file or embedded note, the process looks resumable from inside the runtime and fragile from everywhere else. A human reviewer may not see the current hold reason. A second agent may not know what remains pending. The workflow engine may not know that the agent is waiting on counsel review at all. That fragility becomes visible when you compare it to the persistence model it was extended from.

That kind of local persistence works well for coding harnesses. Files, git history, checkpoints, progress notes, and workspace documents let the next session resume from where the prior one stopped. For coding and personal tasks, the state is close to the work, the project is bounded, and the artifacts live in the same environment the agent already uses. The harness solves a real continuity problem because the model has no durable memory of its own.

Enterprise state is different in kind. It is not only a convenience for continuation. It is part of the process itself. The harness has to preserve case status, prior decisions, pending approvals, deadlines, exceptions, conversation continuity, and the history of actions taken over time. That state may need to remain visible across multiple services, other agents, and human operators. It may have compliance implications. It may determine whether a later action is legal, valid, or timely.

If the coding pattern is extended here, the result is fragile and often invisible state. The agent stores progress in local notes, embedded memory files, or runtime artifacts that work for one task instance but do not constitute a durable operational record. Another agent instance may not see them. A human reviewer may not be able to inspect them. A workflow engine may not know they exist. The organization ends up with state that helps the agent continue but does not help the institution govern the process.

An enterprise harness has to treat state as durable process state tied to the business object. In the vendor-onboarding example, the authoritative record should show the supplier ID, the current workflow step, the sanctions-review outcome, the pending legal approval, the next due date, and the explanation for the hold. The agent can still use private working memory for local reasoning, but the authoritative process state has to live in governed stores with visibility, lifecycle management, and shared access patterns appropriate to the process. That is what makes the work resumable, reviewable, and governable.

Failure handling: local correction versus governed recovery

Failure handling is where the workstation pattern becomes actively dangerous. In a coding harness, a failed tool call usually means retry, revise, rerun tests, or restore from a checkpoint. That pattern works because most failures are local to the workspace and can be corrected by more execution against the same artifact.

Enterprise failures are different. They may involve inconsistent state across services, missing approvals, duplicate events, broken handoffs, regulatory deadlines, or partial side effects that cannot be safely retried without coordination. A local retry may make the situation worse rather than better because the failure is no longer confined to the runtime. It is part of the business process.

A common example is a two-system financial update. An agent posts a refund adjustment to the payment ledger, then fails while updating the customer account platform that drives statements and balances. If the harness simply retries the whole transaction because that is how it handles error recovery, the ledger may receive a duplicate adjustment while the customer account remains out of sync. The result is not just a failed task. It is a reconciliation problem with financial and control implications.

An enterprise harness therefore needs governed recovery. It has to support idempotent retry, task suspension, human escalation, approval requests, workflow rerouting, and compensating actions. In the refund example, the harness should record the partial outcome, block unsafe retries, detect whether the ledger write was already committed, and route the case into recovery. That may mean opening an operations task, invoking a compensating transaction, or requiring finance-operations approval before proceeding. The difference is fundamental. A coding harness treats failure as something to correct locally. An enterprise harness treats failure as a stateful process event that has to preserve auditability and process integrity.

The Harness Layers are not independent

These harness layers work only as an integrated system. Identity enables authorization because the policy engine needs a known principal. Authorization constrains tool access because the control plane cannot safely expose capabilities without scoped permissions. State management and orchestration depend on those controls because long-running workflows need durable, governed state rather than ad hoc local memory. Observability depends on identity and state because the enterprise has to know who acted and what happened. Explainability depends on observability and policy because it has to reconstruct not only the action but the reason and governing rule behind it.

The architectural implication is direct. Adding identity to a coding harness does not produce an enterprise harness if the authorization model, state model, and failure model remain workstation-shaped. Adding a workflow engine does not solve the problem if the agent still operates with broad ambient authority and local-only memory. Adding audit logs does not create trust if the system cannot tie them to a stable principal or reconstruct a governed recovery path. Enterprise harness design is not a checklist of features. It is a dependency structure.

This also explains why many enterprise agent efforts feel haphazard. Teams add prompts, tools, memory files, retries, and workflow logic in response to immediate failures, but the pieces are built on inconsistent assumptions. Some parts assume a coding assistant. Others assume a business worker. The resulting system may function in demonstrations and fail in production because the architectural layers do not support one another.

Conclusion

The visible literature on harnesses has clarified important mechanics: externalized state, context control, tool mediation, orchestration, verification, and recovery. But much of that literature still reflects the assumptions of coding and personal agents. It is grounded in sessions, workspaces, files, and user-adjacent execution. That is why it transfers only partially to enterprise settings.

Enterprise agents operate inside business processes, across systems, under policy, with persistent identity and institutional consequences. Their harness cannot be a larger workstation scaffold. It has to be a process control plane. The three layers capture that architecture: security, identity, and trust at the base; execution and control in the middle; operating model at the top. The four dimensions examined here show where the coding pattern breaks structurally rather than incrementally.

As agents move into production operations, harness architecture becomes the primary engineering challenge. It becomes more consequential than prompt design, and in many cases more consequential than model selection. The model supplies inference. The enterprise harness decides whether that inference can be turned into governed work.

***

Feel free to reach out and connect with the author, Eric Broda, on LinkedIn. Questions and comments are welcome and encouraged!

***

This is part of larger article that addresses a broader suite of topics related to agents (see my full article list). If you like this article, you may wish to checkout an upcoming book, “Agentic Mesh”, with O’Reilly, and on Amazon.

***

All images in this document except where otherwise noted have been created by Eric Broda. All icons used in the images are stock PowerPoint icons and/or are free from copyrights.

The opinions expressed in this article are that of the author(s) alone and do not necessarily reflect the views of my (our) clients.

Looking for more?
👉 Discover the full O’Reilly Agentic Mesh book by Eric Broda and Davis Broda

🎧 Follow co-hosts John Miller and Eric Broda on The Agentic Mesh Podcast on Youtube, Spotify and Apple Podcasts. A new video every week!

Pawel Jozefiak

Mar 31

The governance layer point is where most enterprise agent implementations fall apart. You can build a capable agent in a weekend. Building the harness that makes it safe to run autonomously in a production environment takes months.

Security, identity, audit trail, escalation paths - none of that comes with the agent. I've been thinking about this as two separate products: the agent itself and the scaffolding that makes it trustworthy. Most teams conflate them and then wonder why sign-off is slow. Where do you draw the line between what belongs in the harness versus the agent's system prompt?

1 reply by Eric Broda

1 more comment...

Discussion about this post

Ready for more?