Anthropic's Zero Trust Agents: Authenticated, Sandboxed, and Still Dangerous

Personal and coding agents are giving enterprises a false sense of security.

Jun 03, 2026

If you enjoy our content, please consider subscribing to support our work and receive quality articles from industry professionals delivered every week.

Explore the latest articles contributing to the discussion around agents, agent ecosystems, and enterprise AI:

Skills or Tools, Which Should an Agent Use?: https://agenticmesh.substack.com/p/skills-or-tools-which-should-an-agent

Agents Are Here. They Are Multiplying. The Enterprise Security Model Is Not Ready.: https://agenticmesh.substack.com/p/agents-are-here-they-are-multiplying

Anthropic’s Zero Trust Agents: Authenticated, Sandboxed, and Still Dangerous

Personal and coding agents are giving enterprises a false sense of security.

The blast radius of a single compromised agent looks local — a bad file edit, an unsafe command, an exposed secret, or a broken test suite. One developer supervising one agent in one workspace feels manageable, and for that case, it largely is. Multiply that across thousands of agents running across engineering, finance, claims, and compliance, and the blast radius is no longer local — it’s the whole enterprise.

Each agent can retrieve context, select tools, call APIs, delegate work, and trigger downstream systems — and most enterprises are deploying them before they have the security controls to govern that behavior. Help Net Security reported the gap directly: “Most organizations planned to deploy agentic AI into business functions, and twenty nine percent reported that they were prepared to secure those deployments.”[1]

The urgent question is whether the security model built around personal and coding agents can survive when those agents become enterprise infrastructure. The International AI Safety Report 2026 states the risk plainly: “AI agents pose heightened risks because they act autonomously, making it harder for humans to intervene before failures cause harm.”[2]

Anthropic’s Zero Trust work is the most concrete answer available. It addresses a problem moving faster than most security programs: agents now read source code, retrieve documents, call tools, edit files, run commands, update systems, and coordinate work across services. It still leaves one gap that matters most at scale: execution authority is unresolved.

Authentication establishes the actor, alignment governs model behavior, and tool approval limits available capabilities, but the platform still has to decide whether a requested action belongs inside the current task, with the current context, under the current policy.

Figure 1, Anthropic: Authenticated, Sandboxed, and Still Dangerous

Agent security is moving through three stages.

The first stage is the coding or personal agent working inside available context. A developer opens a repository and asks an agent to explain a module, fix a bug, write tests, or update documentation. The repository gives the agent a bounded working set. That feels lower risk because the context is present, but source code is proprietary and security-sensitive. The repository may also include secrets, environment files, internal endpoints, and deployment configuration.

The second stage is the context-aware agent. It reaches beyond the repository into enterprise knowledge: design documents, tickets, architecture notes, Slack threads, incident reports, logs, API specifications, customer examples, database schemas, and policy documents. The agent may need only a narrow slice of that material, while connected systems expose far more. A bug fix may require a production log or a customer case. The agent’s context window can aggregate sensitive data.

The third stage is the headless enterprise agent. It participates in a business process: invoice validation, claims triage, account reconciliation, exception handling, regulatory reporting, updates, or approvals. A manipulated coding agent may damage a repository. A manipulated context-aware agent may expose sensitive data. A manipulated headless enterprise agent may move money, change records, or create audit failures.

Across these stages, the security boundary moves from local code access, to governed retrieval, to business action.

The Attack Surface Extends to Execution

Prompt injection remains a real threat, but agent security has to cover more than model behavior.

For coding agents, the attack surface begins in the development environment. The agent may use filesystem access, shell commands, package managers, Git, build tools, and internet access. Hostile instructions can appear in README files, issue descriptions, comments, dependencies, generated files, test data, or external documentation.

For context-aware agents, retrieval becomes part of the attack surface. The agent may search document repositories, ticketing systems, chat systems, observability platforms, source-control history, knowledge bases, SaaS applications, and internal APIs. Security teams now have to manage excess retrieval, mixed trust levels, private data exposure, and use of sensitive context for tasks that did not require it. Simon Willison’s lethal-trifecta model names the dangerous combination: “Access to your private data,” “Exposure to untrusted content,” and “The ability to externally communicate in a way that could be used to steal your data.”[3]

For headless enterprise agents, business execution becomes the main risk. The agent may update records, trigger downstream systems, send messages, approve steps, create tickets, or invoke payment workflows. Security has to govern the path from instruction to context selection to tool execution to audit evidence.

Traditional controls help, but they leave gaps at that path. Authentication proves the actor, RBAC and ABAC define broad permission structures, sandboxes limit runtime behavior, and logs record activity. Execution authority still has to be decided for one agent, one action, one task instance, and one set of data.

What Anthropic Gets Right

Anthropic’s Zero Trust work moves agent security out of vague concern and into practical architecture. Its own framing captures the core problem: “Traditional access controls won’t prevent agents from misusing legitimate permissions, and monitoring needs to account for attacks designed to succeed through persistence rather than exploitation.”[4] It names the right threat categories: prompt injection, tool misuse, tool poisoning, privilege abuse, memory poisoning, context poisoning, supply-chain compromise, and delegation risk. It also gives enterprises a mature baseline for agent controls.

The strongest controls in that baseline are worth preserving. Cryptographic agent identity improves attribution. Short-lived credentials reduce the value of stolen secrets. Least agency limits what agents can do. Sandboxing reduces blast radius. Tool allow-listing and parameter validation constrain tool use. Immutable audit trails and distributed tracing support investigation. Memory isolation and context integrity checks address long-term poisoning risk. Signed configurations and AI-BOM practices bring supply-chain discipline into the agent stack.

Anthropic’s “impossible vs. tedious” design test also applies to governed execution. Agentic attackers can repeat low-cost attempts at scale, so controls based on friction will erode. Short grant lifetimes reduce the window for reuse. Executor-side validation removes unauthorized paths from the agent’s reach. Context policy prevents relevance ranking from becoming a substitute for permission. Derived grants prevent delegation from spreading the parent’s full authority.

Figure 2, From Secure Runtime to Governed Execution

The Missing Boundary: Execution Authority

Anthropic’s model is strongest around the agent runtime: identity, credentials, tools, memory, sandboxing, logging, configuration, supply chain, and response. Enterprises also need a boundary for execution authority.

For coding agents, the gap is raw capability. The agent may have broad filesystem access, shell access, package manager access, Git access, and internet access. A safer design separates low-risk inspection from file writes, package installs, arbitrary shell commands, and pushes.

For context-aware agents, the gap is context authority. The agent may need enterprise context, but only some context is appropriate for the task. Retrieval policy should account for data classification, user role, workspace, task purpose, source trust, retention policy, and output constraints.

For headless enterprise agents, the gap is business authority. The agent may be authenticated, sandboxed, and limited to an approved tool list. It still needs explicit permission to invoke a skill or tool that changes business state.

The control decision has to authorize the execution itself, with the actor and tool treated as inputs to that decision.

Trust Authority and Execution Grants

A Trust Authority is the control point that decides whether an agent should receive authority for a specific action.

For a coding agent, it evaluates file operations, shell commands, package installs, Git operations, and cloud CLI use. For a context-aware agent, it evaluates document retrieval, log access, customer context, and source combinations. For a headless enterprise agent, it evaluates business APIs, system-of-record updates, external messages, exception escalations, and delegation.

The Trust Authority turns authorization into a runtime decision. It evaluates agent identity, user profile, workspace profile, task purpose, process state, data classification, skill identity, tool identity, risk, approval state, and policy. The agent asks for authority. The platform issues it.

A short-lived execution grant binds that decision to a specific operation. It should encode the initiating user or process, agent identity, workspace, task or process instance, permitted action, resource boundary, parameter constraints, approval state, expiration, and audit correlation ID.

In a coding workflow, a grant might authorize “run unit tests in this repository for the next five minutes” or “modify files only under /src/parser and /tests/parser.” In a context-aware workflow, it might authorize “read incident logs for service Y for the last 24 hours, excluding customer payloads.” In a business process, it might authorize “invoke invoice-validation skill for invoice 123” or “submit exception for human review.”

The grant converts general policy into bounded authority for the operation in front of the system.

Enforcement Belongs Outside the Agent

A Tool Executor performs privileged operations only after validating a grant. The agent presents the grant. The executor verifies it, validates parameters, applies sandbox and network policy, executes the operation, records evidence, and returns the result.

For coding agents, the executor should expose structured capabilities before raw shell access: inspect file, apply patch, run test suite, search repository, and open pull request. Raw shell should be constrained, time-limited, and logged.

For context-aware agents, the executor should mediate retrieval. It should enforce source permissions, query constraints, data classification, redaction, retrieval limits, and output controls. The agent should not directly browse every connected repository, knowledge store, or SaaS application.

For headless enterprise agents, the executor should mediate operational tools and APIs. The agent should not call sensitive systems directly. The executor is where policy, grant, tool schema, and audit evidence converge. That includes protecting the tool schema itself; the executor should not expose a tool’s full parameter structure to an agent that lacks access to that tool.

The same governance model has to cover the skills and context that shape the agent’s decisions before any tool call is made.

Skills and Context as Governed Assets

Skills and tools are sensitive enterprise assets as well as security surfaces. A mature skill encodes accumulated IP: operational procedures, regulatory interpretations, exception-handling logic, and customer treatment rules refined over years. A tool’s schema, parameter structure, and capability description reveal what an enterprise can do and how its systems are connected: which integrations exist, which internal APIs are exposed, and which operational capabilities have been built or licensed. Skill exposure is an IP risk. Tool schema exposure is a reconnaissance risk because it reveals operational capabilities before any specific misuse occurs.

A Skills Registry should treat skills as controlled assets with owner, version, provenance, required tools, risk rating, certification state, and revocation path. Tools require the same treatment: identity, permitted agents, schema access controls, risk rating, and revocation path. Without registry controls on both, a skill is exposed through every agent that uses it, while a tool’s schema becomes visible to any agent that can query it.

Coding-agent skills may refactor modules, generate tests, or review code. Context-aware skills may retrieve architecture context or summarize incident history. Headless enterprise skills may validate invoices, triage claims, screen sanctions, or handle payment exceptions.

Context governance matters for the same reason. The system must decide which context may be retrieved, from which sources, under which authority, with which redactions, and for how long. A document can be relevant and still inappropriate for the user, task, workspace, jurisdiction, confidentiality class, or output channel.

Context governance should include source identity, document classification, retrieval limits, redaction rules, and output restrictions. Without those checks, context engineering creates a data-loss path.

Profiles, Process Definitions, and Delegation

Open-ended personal and coding agents need a policy anchor because they often operate without a formal process definition. A user profile, workspace profile, or role profile should define the maximum authority envelope, including approval-required tools, command patterns, secret scopes, and retention rules.

For coding agents, the profile might allow reading and editing files inside the repository, running tests, and using approved documentation domains. It might require approval for package installation, unusual shell commands, Git push, Docker socket access, cloud CLI use, or access outside the repository. For context-aware agents, the profile might define searchable systems, excluded data classes, customer-data rules, and human approval points.

That envelope narrows to the current act through the execution grant.

Headless enterprise agents need process definitions because business workflows have known steps, approvals, and exception paths. A process definition should specify the workflow steps, tools, data sources, approval points, segregation-of-duty constraints, and audit requirements.

The process definition narrows what an agent may do during a workflow. A claims agent should not gain access to payment execution because it can access claim records. An invoice agent should not contact vendors or alter payment instructions unless the process authorizes those steps. A customer remediation agent should not issue credits without threshold checks and approvals.

Process definitions handle known workflows, but agents frequently decompose one task into subtasks dynamically, which creates an authority inheritance problem. When one agent delegates work to another, the receiving agent should receive a derived grant, not inherited authority. The derived grant should preserve the original purpose, initiating authority, resource scope, expiration, and audit lineage while narrowing authority to the delegated subtask.

In coding-agent systems, a primary agent may spawn sub-agents to inspect tests, review logs, analyze dependencies, or propose refactors. Those sub-agents should not receive unrestricted access to the full developer environment. In enterprise systems, an orchestrator may delegate to finance, legal, compliance, operations, or customer-service agents. Each delegated agent should receive only the authority needed for its part of the process, so delegation narrows authority rather than spreading it.

Execution Provenance

Unlike logs, which record activity, execution provenance should explain why the action was authorized.

Provenance should record the original request, user or process authority, policy evaluation, grant issuance, context retrieval, skill version, executor decision, and output.

In coding-agent use, this lets a reviewer see why a command ran, why a file changed, which grant authorized it, and what output was produced. In context-aware use, it shows which sources were retrieved, why they were permitted, what data classifications were involved, what redactions were applied, and what context entered the model. In headless enterprise use, it lets an auditor trace a business action from request or event through process rule, agent action, grant, tool execution, output, and system update.

The Joined Architecture

The three stages use different policy anchors: coding agents use a user or workspace profile, context-aware agents add retrieval policy for enterprise data, and headless enterprise agents use process definitions tied to workflow state. The control pattern stays consistent because each stage still has to turn a request into bounded authority before the agent can act.

Anthropic’s Zero Trust work gives enterprises the right baseline for identity, credentials, tools, memory, supply chain, observability, sandboxing, and response. Enterprises also need controls that authorize context retrieval, skill requests, tool calls, delegation, and business updates before execution.

The durable security boundary for agents is governed execution: an external authority decision, a bounded grant, mediated tool and context access, and evidence that ties the action back to the authority that allowed it.

Looking for more?
👉 Discover the full O’Reilly Agentic Mesh book by Eric Broda and Davis Broda

🎧 Follow The Agentic Mesh Podcast on Youtube, Spotify and Apple Podcasts. A new video every week!

Endnotes

[1] Help Net Security, “Enterprises are racing to secure agentic AI deployments,” February 23, 2026, https://www.helpnetsecurity.com/2026/02/23/ai-agent-security-risks-enterprise/

[2] International AI Safety Report, “International AI Safety Report 2026,” https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026

[3] Simon Willison, “The lethal trifecta for AI agents: private data, untrusted content, and external communication,” June 16, 2025, https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

[4] Anthropic, “Zero Trust for AI agents,” May 27, 2026, https://claude.com/blog/zero-trust-for-ai-agents

Discussion about this post

Ready for more?