How to secure AI agent delegation and multi-agent communication
When Agent A delegates to Agent B, whose permissions apply? Whose audit trail records the action? And what happens when Agent B is compromised?
In September 2025, security researcher Johann Rehberger demonstrated a vulnerability he called Cross-Agent Privilege Escalation. In development environments where multiple AI agents share a codebase (Copilot, Claude, Gemini all configured in the same project), a prompt-injected agent can rewrite another agent's configuration file. Copilot gets injected, modifies Claude's .mcp.json, and on the next invocation Claude runs arbitrary code. Claude then modifies Copilot's configuration. What starts as a single indirect prompt injection becomes a self-reinforcing loop of compromised agents, each reinfecting the other.
Two months later, researchers disclosed Agent Session Smuggling: a pattern where a sub-agent embeds a silent action (in this case, a stock trade) inside an otherwise routine response. The parent agent processes the response, executes the embedded action, and the trade goes through with no prompt and no visibility to the user.
These aren't theoretical risks. They're demonstrated attack patterns against multi-agent architectures that are already in production. And they expose a fundamental gap: the security model most teams apply to single-agent systems doesn't extend to systems where agents communicate with, delegate to, and depend on other agents.
The earlier articles in this series covered agent identity, supply chain verification, tool invocation policy, and prompt injection containment. Those controls assume a single agent acting within its own permission boundary. This article covers what happens when agents cross that boundary: delegating tasks, passing context, consuming each other's output, and failing in ways that propagate.
The trust problem in multi-agent systems
A single-agent system has a clear trust model. The user trusts the agent. The agent trusts its tools (subject to supply chain verification). The tools trust the agent's credentials (subject to RBAC and FGA). Every relationship is bilateral and well-defined.
Multi-agent systems introduce transitive trust. Agent A trusts Agent B because they're both part of the same orchestration framework. Agent B trusts Agent C because it was configured as a sub-agent. But Agent A never explicitly authorized Agent C. The trust was inherited through the chain, and with it, the assumptions about what each agent is permitted to do.
This is the confused deputy problem at scale. In the original formulation, a trusted program is tricked into misusing its authority on behalf of an attacker. In multi-agent systems, every delegation is a potential confused deputy scenario. Agent A delegates a task to Agent B. Agent B has its own permissions. Does it execute the task with Agent A's permissions, its own permissions, or some combination? If Agent B has access to systems that Agent A doesn't, the delegation just escalated privileges. If Agent A has higher permissions than Agent B, and Agent B inherits them, the scoping that was supposed to contain Agent B is bypassed.
The Gradient Institute's 2025 report on multi-agent risk put it precisely: a collection of safe agents does not guarantee a safe collection of agents. Each agent might be individually well-scoped, well-tested, and well-behaved. The emergent behavior of the system, the interactions, delegations, and information flows between them, can still produce outcomes none of the individual agents would produce alone.
Authenticating agent-to-agent communication
The first defense is the same one that applies everywhere else in this series: identity. Every agent-to-agent message should be authenticated. The receiving agent should verify the sending agent's identity before processing any request, just as an MCP server verifies an agent's token before executing a tool.
In practice, this means each agent in a multi-agent system has its own client credentials (as described in the credentials guide), and every inter-agent request carries a signed token:
Without this, any process that can craft a message in the expected format can impersonate an agent. The Cross-Agent Privilege Escalation attack works precisely because there's no authentication between agents. One agent modifies another's config, and the system accepts it because it assumes anything in the shared codebase is trusted.
The delegation permission model
Authentication tells you who is sending the message. Authorization tells you whether that agent is allowed to delegate this specific task. The permission model for delegation needs to answer three questions:
Who can delegate to whom? Not every agent should be able to call every other agent. Define explicit delegation paths. The support agent can delegate to the knowledge base agent but not to the deployment agent. The orchestrator can delegate to all agents, but sub-agents cannot delegate to each other without going through the orchestrator.
What permissions does the receiving agent operate with? This is the critical design decision. The receiving agent's effective permissions should be the intersection of its own role and the originating authorization scope. Agent B can never exceed its own role permissions, and it can never exceed the permissions that the originating user authorized. Both checks must pass.

This double check prevents the two most common delegation vulnerabilities. If Agent B has broader permissions than Agent A (privilege escalation through delegation), the originating user check blocks it. If Agent A tries to use Agent B to access a resource outside the user's scope (scope bypass through delegation), the FGA check blocks it.
How deep can the chain go? Unbounded delegation chains are dangerous. Agent A delegates to Agent B, which delegates to Agent C, which delegates to Agent D. Each hop adds latency, dilutes context, and creates another point where permissions might escalate or errors might compound. Set a maximum delegation depth and enforce it:
Validating inter-agent messages

Authentication and authorization aren't enough. The content of inter-agent messages also needs validation. Agent Session Smuggling works because the parent agent processes the sub-agent's response without checking whether it contains embedded actions.
Every inter-agent message should be validated at the receiving end:
- Schema validation. Define a schema for what a valid response looks like for each task type. If the support agent delegates a knowledge base lookup, the response should contain text content and source references. If it contains tool invocations, action requests, or instructions, something is wrong.
- Injection scanning. Scan inter-agent messages for content that looks like instructions rather than data. The same patterns described in the supply chain article apply here: phrases like "ignore previous instructions," "override your system prompt," or encoded variants should be flagged.
- Size and rate limits. A sub-agent returning a response that's ten times larger than expected is suspicious. A sub-agent that responds with a burst of follow-up messages is suspicious. Set bounds on response size and frequency.
When delegation chains fail: Containing cascading errors

Multi-agent systems don't just have security risks. They have reliability risks that compound through delegation chains. This is the territory of ASI08 (cascading failures), and it's closely related to inter-agent communication security because the propagation path is the same: one agent's output becomes another agent's input.
Cascading failures in multi-agent systems are more dangerous than in traditional distributed systems for three reasons.
- Errors look valid. In a microservice, a failed call returns an error code. In a multi-agent system, a hallucinating agent returns a confident, well-formatted response that happens to be wrong. The receiving agent has no signal that the input is corrupted. It processes the response, makes decisions based on false information, and passes its output downstream. By the time a human sees the final result, the original error has been laundered through multiple layers of plausible reasoning.
- Errors compound. A microservice returns the same wrong answer every time, which makes the error detectable. An agent that receives a slightly wrong input might produce a significantly wrong output, because the LLM extrapolates from the bad data. Each hop in the chain can amplify the original error rather than reproducing it.
- Errors persist. If an agent writes bad data to a shared store, database, or memory system, downstream agents that read from that store inherit the corruption. The original failing agent might have been restarted or fixed, but its incorrect output lives on in the shared state.
The defenses are structural:
- Validate at every handoff. Don't pass one agent's output to another without checking it. Run schema validation, range checks, and consistency checks on every inter-agent response before the next agent consumes it. This is the single most effective defense against cascading errors.
- Implement circuit breakers per chain. If a delegation chain hits a threshold of validation failures, halt the entire chain and notify the orchestrator. Don't let agents retry indefinitely: retry loops in multi-agent systems consume resources and can amplify errors if the retried request produces a different but equally wrong response.
- Preserve error attribution. When a chain fails, you need to know which agent caused the failure. Log every handoff with the sending agent's identity, the receiving agent's identity, the task ID, and the validation result. The audit trail should let you reconstruct the full chain and identify the point where things went wrong.
- Isolate shared state. If agents write to shared resources (databases, files, memory stores), scope those writes by agent identity and task. Agent B's output from Task X should not overwrite Agent C's input for Task Y. Use the task correlation ID to namespace shared state, and treat writes from any agent as tentative until a human or an orchestrator confirms the result.
The audit trail for delegation
Every delegation, every inter-agent message, and every handoff validation needs to be logged. The audit entry should capture:
When something goes wrong in a multi-agent system, the first question is always "what happened and in what order?" The audit trail needs to answer that across agent boundaries. A delegation chain that spans four agents and three handoffs should be reconstructable from a single task ID.
Securing AI agents and MCP servers with WorkOS
The delegation and multi-agent patterns in this article depend on each agent having its own authenticated identity and on every resource access being checked against the originating user's authorization. WorkOS provides that foundation: OAuth 2.1 for per-agent credentials, FGA for resource-scoped authorization checks at every delegation boundary, audit logging for reconstructing delegation chains, and native MCP server authentication.