In this article

May 27, 2026

How to secure AI agent delegation and multi-agent communication

When Agent A delegates to Agent B, whose permissions apply? Whose audit trail records the action? And what happens when Agent B is compromised?

Maria Paktiti

May 27, 2026

Explore with AI

Open in ChatGPT

Open in Claude

Open in Perplexity

In September 2025, security researcher Johann Rehberger demonstrated a vulnerability he called Cross-Agent Privilege Escalation. In development environments where multiple AI agents share a codebase (Copilot, Claude, Gemini all configured in the same project), a prompt-injected agent can rewrite another agent's configuration file. Copilot gets injected, modifies Claude's .mcp.json, and on the next invocation Claude runs arbitrary code. Claude then modifies Copilot's configuration. What starts as a single indirect prompt injection becomes a self-reinforcing loop of compromised agents, each reinfecting the other.

Two months later, researchers disclosed Agent Session Smuggling: a pattern where a sub-agent embeds a silent action (in this case, a stock trade) inside an otherwise routine response. The parent agent processes the response, executes the embedded action, and the trade goes through with no prompt and no visibility to the user.

These aren't theoretical risks. They're demonstrated attack patterns against multi-agent architectures that are already in production. And they expose a fundamental gap: the security model most teams apply to single-agent systems doesn't extend to systems where agents communicate with, delegate to, and depend on other agents.

The earlier articles in this series covered agent identity, supply chain verification, tool invocation policy, and prompt injection containment. Those controls assume a single agent acting within its own permission boundary. This article covers what happens when agents cross that boundary: delegating tasks, passing context, consuming each other's output, and failing in ways that propagate.

The trust problem in multi-agent systems

A single-agent system has a clear trust model. The user trusts the agent. The agent trusts its tools (subject to supply chain verification). The tools trust the agent's credentials (subject to RBAC and FGA). Every relationship is bilateral and well-defined.

Multi-agent systems introduce transitive trust. Agent A trusts Agent B because they're both part of the same orchestration framework. Agent B trusts Agent C because it was configured as a sub-agent. But Agent A never explicitly authorized Agent C. The trust was inherited through the chain, and with it, the assumptions about what each agent is permitted to do.

This is the confused deputy problem at scale. In the original formulation, a trusted program is tricked into misusing its authority on behalf of an attacker. In multi-agent systems, every delegation is a potential confused deputy scenario. Agent A delegates a task to Agent B. Agent B has its own permissions. Does it execute the task with Agent A's permissions, its own permissions, or some combination? If Agent B has access to systems that Agent A doesn't, the delegation just escalated privileges. If Agent A has higher permissions than Agent B, and Agent B inherits them, the scoping that was supposed to contain Agent B is bypassed.

The Gradient Institute's 2025 report on multi-agent risk put it precisely: a collection of safe agents does not guarantee a safe collection of agents. Each agent might be individually well-scoped, well-tested, and well-behaved. The emergent behavior of the system, the interactions, delegations, and information flows between them, can still produce outcomes none of the individual agents would produce alone.

Authenticating agent-to-agent communication

The first defense is the same one that applies everywhere else in this series: identity. Every agent-to-agent message should be authenticated. The receiving agent should verify the sending agent's identity before processing any request, just as an MCP server verifies an agent's token before executing a tool.

In practice, this means each agent in a multi-agent system has its own client credentials (as described in the credentials guide), and every inter-agent request carries a signed token:

  
interface AgentMessage {
  senderId: string;           // Authenticated agent identity
  senderToken: string;        // JWT proving sender identity
  recipientId: string;        // Intended recipient
  taskId: string;             // Correlation ID for the delegation chain
  originatingUserId: string;  // User who started the workflow
  originatingMembershipId: string; // For FGA checks
  payload: any;               // The actual request or response
  timestamp: string;
}

async function validateAgentMessage(
  message: AgentMessage
): Promise<{ valid: boolean; reason?: string }> {
  // Verify the sender's token
  const tokenClaims = await verifyAccessToken(message.senderToken);

  // Confirm token matches claimed sender
  if (tokenClaims.sub !== message.senderId) {
    return { valid: false, reason: 'Token does not match sender identity' };
  }

  // Confirm token hasn't expired
  if (Date.now() / 1000 > tokenClaims.exp) {
    return { valid: false, reason: 'Sender token expired' };
  }

  // Confirm this sender is allowed to communicate with this recipient
  const canDelegate = await checkDelegationPermission(
    message.senderId,
    message.recipientId
  );
  if (!canDelegate) {
    return { valid: false, reason: 'Sender not authorized to delegate to this agent' };
  }

  return { valid: true };
}

Without this, any process that can craft a message in the expected format can impersonate an agent. The Cross-Agent Privilege Escalation attack works precisely because there's no authentication between agents. One agent modifies another's config, and the system accepts it because it assumes anything in the shared codebase is trusted.

The delegation permission model

Authentication tells you who is sending the message. Authorization tells you whether that agent is allowed to delegate this specific task. The permission model for delegation needs to answer three questions:

Who can delegate to whom? Not every agent should be able to call every other agent. Define explicit delegation paths. The support agent can delegate to the knowledge base agent but not to the deployment agent. The orchestrator can delegate to all agents, but sub-agents cannot delegate to each other without going through the orchestrator.

  
const delegationPolicies: Record<string, string[]> = {
  'orchestrator':        ['support-agent', 'analyst-agent', 'deploy-agent'],
  'support-agent':       ['kb-agent'],
  'analyst-agent':       ['db-query-agent'],
  'deploy-agent':        [],  // Terminal agent, cannot delegate
  'kb-agent':            [],  // Terminal agent, cannot delegate
  'db-query-agent':      [],  // Terminal agent, cannot delegate
};

function checkDelegationPermission(
  senderId: string,
  recipientId: string
): boolean {
  const allowed = delegationPolicies[senderId] || [];
  return allowed.includes(recipientId);
}

What permissions does the receiving agent operate with? This is the critical design decision. The receiving agent's effective permissions should be the intersection of its own role and the originating authorization scope. Agent B can never exceed its own role permissions, and it can never exceed the permissions that the originating user authorized. Both checks must pass.

Diagram showing how an agent's effective permissions are calculated as the intersection of the user's authorized scope and the agent's role permissions. The user has six permissions including admin:export and users:manage. The agent's role has five permissions including kb:read and kb:search. Only the three permissions that appear in both sets (tickets:read, tickets:write, billing:read) become the agent's effective permissions. Permissions that exist in only one set are blocked.

  
async function authorizeDelgatedTask(
  recipientAgentId: string,
  originatingMembershipId: string,
  toolName: string,
  resourceTypeSlug: string,
  resourceExternalId: string
): Promise<{ authorized: boolean; reason?: string }> {
  // Check 1: Does the recipient agent's role permit this tool?
  const agentPermitted = await checkRolePermission(recipientAgentId, toolName);
  if (!agentPermitted) {
    return { authorized: false, reason: 'Recipient agent role does not permit this tool' };
  }

  // Check 2: Does the originating user's membership authorize this resource?
  const { authorized } = await workos.authorization.check({
    organizationMembershipId: originatingMembershipId,
    permissionSlug: getRequiredPermission(toolName),
    resourceTypeSlug,
    resourceExternalId,
  });

  if (!authorized) {
    return { authorized: false, reason: 'Originating user not authorized for this resource' };
  }

  return { authorized: true };
}

This double check prevents the two most common delegation vulnerabilities. If Agent B has broader permissions than Agent A (privilege escalation through delegation), the originating user check blocks it. If Agent A tries to use Agent B to access a resource outside the user's scope (scope bypass through delegation), the FGA check blocks it.

How deep can the chain go? Unbounded delegation chains are dangerous. Agent A delegates to Agent B, which delegates to Agent C, which delegates to Agent D. Each hop adds latency, dilutes context, and creates another point where permissions might escalate or errors might compound. Set a maximum delegation depth and enforce it:

  
const MAX_DELEGATION_DEPTH = 3;

function checkDelegationDepth(message: AgentMessage): boolean {
  const depth = message.delegationChain?.length || 0;
  return depth < MAX_DELEGATION_DEPTH;
}

Validating inter-agent messages

Diagram showing a delegation chain from User to Orchestrator to Support agent to KB agent, with validation checkpoints between each handoff. Each checkpoint verifies the sender's token, confirms the delegation path is allowed, intersects permissions, and checks that delegation depth hasn't exceeded the maximum. Response validation at each boundary checks schema, scans for embedded instructions, and enforces size limits. An audit trail runs across the bottom, logging every delegation and validation result with a task correlation ID.

Authentication and authorization aren't enough. The content of inter-agent messages also needs validation. Agent Session Smuggling works because the parent agent processes the sub-agent's response without checking whether it contains embedded actions.

Every inter-agent message should be validated at the receiving end:

‍Schema validation. Define a schema for what a valid response looks like for each task type. If the support agent delegates a knowledge base lookup, the response should contain text content and source references. If it contains tool invocations, action requests, or instructions, something is wrong.

  
const expectedResponseSchemas: Record<string, (response: any) => boolean> = {
  'kb_lookup': (res) => {
    return typeof res.content === 'string'
      && Array.isArray(res.sources)
      && !res.toolCalls
      && !res.actions;
  },
  'data_query': (res) => {
    return Array.isArray(res.rows)
      && typeof res.rowCount === 'number'
      && !res.toolCalls;
  },
};

function validateAgentResponse(taskType: string, response: any): boolean {
  const validator = expectedResponseSchemas[taskType];
  if (!validator) return false; // Unknown task types are rejected
  return validator(response);
}

‍Injection scanning. Scan inter-agent messages for content that looks like instructions rather than data. The same patterns described in the supply chain article apply here: phrases like "ignore previous instructions," "override your system prompt," or encoded variants should be flagged.‍
Size and rate limits. A sub-agent returning a response that's ten times larger than expected is suspicious. A sub-agent that responds with a burst of follow-up messages is suspicious. Set bounds on response size and frequency.

When delegation chains fail: Containing cascading errors

Diagram comparing two scenarios. Without validation: Agent A hallucinates a customer ID, Agent B queries billing for the wrong customer, Agent C generates a report with wrong data, and the user sees a confident, polished, completely wrong result. Each agent adds plausible reasoning and the original error becomes invisible. With handoff validation: Agent A hallucinates the same customer ID, but a validation checkpoint catches the consistency error before Agent B ever receives it. The chain halts at depth 1, the orchestrator is notified, and a human reviews the error early.

Multi-agent systems don't just have security risks. They have reliability risks that compound through delegation chains. This is the territory of ASI08 (cascading failures), and it's closely related to inter-agent communication security because the propagation path is the same: one agent's output becomes another agent's input.

Cascading failures in multi-agent systems are more dangerous than in traditional distributed systems for three reasons.

‍Errors look valid. In a microservice, a failed call returns an error code. In a multi-agent system, a hallucinating agent returns a confident, well-formatted response that happens to be wrong. The receiving agent has no signal that the input is corrupted. It processes the response, makes decisions based on false information, and passes its output downstream. By the time a human sees the final result, the original error has been laundered through multiple layers of plausible reasoning.‍
Errors compound. A microservice returns the same wrong answer every time, which makes the error detectable. An agent that receives a slightly wrong input might produce a significantly wrong output, because the LLM extrapolates from the bad data. Each hop in the chain can amplify the original error rather than reproducing it.‍
Errors persist. If an agent writes bad data to a shared store, database, or memory system, downstream agents that read from that store inherit the corruption. The original failing agent might have been restarted or fixed, but its incorrect output lives on in the shared state.

The defenses are structural:

‍Validate at every handoff. Don't pass one agent's output to another without checking it. Run schema validation, range checks, and consistency checks on every inter-agent response before the next agent consumes it. This is the single most effective defense against cascading errors.

  
async function handoffWithValidation(
  fromAgent: string,
  toAgent: string,
  taskType: string,
  response: any,
  originalInput: any
): Promise<{ proceed: boolean; reason?: string }> {
  // Schema check
  if (!validateAgentResponse(taskType, response)) {
    return { proceed: false, reason: `Invalid response schema from ${fromAgent}` };
  }

  // Consistency check: does the response relate to the original input?
  const relevanceScore = await checkResponseRelevance(originalInput, response);
  if (relevanceScore < 0.7) {
    return { proceed: false, reason: `Response from ${fromAgent} appears unrelated to task` };
  }

  // Size check
  const responseSize = JSON.stringify(response).length;
  if (responseSize > MAX_RESPONSE_SIZE) {
    return { proceed: false, reason: `Response from ${fromAgent} exceeds size limit` };
  }

  return { proceed: true };
}

‍Implement circuit breakers per chain. If a delegation chain hits a threshold of validation failures, halt the entire chain and notify the orchestrator. Don't let agents retry indefinitely: retry loops in multi-agent systems consume resources and can amplify errors if the retried request produces a different but equally wrong response.‍
Preserve error attribution. When a chain fails, you need to know which agent caused the failure. Log every handoff with the sending agent's identity, the receiving agent's identity, the task ID, and the validation result. The audit trail should let you reconstruct the full chain and identify the point where things went wrong.‍
Isolate shared state. If agents write to shared resources (databases, files, memory stores), scope those writes by agent identity and task. Agent B's output from Task X should not overwrite Agent C's input for Task Y. Use the task correlation ID to namespace shared state, and treat writes from any agent as tentative until a human or an orchestrator confirms the result.

The audit trail for delegation

Every delegation, every inter-agent message, and every handoff validation needs to be logged. The audit entry should capture:

  
await workos.auditLogs.createEvent({
  organizationId: orgId,
  event: {
    action: 'agent.delegation',
    actor: {
      type: 'agent',
      id: senderId,
      metadata: {
        originating_user: originatingUserId,
        delegation_depth: currentDepth,
      },
    },
    targets: [{
      type: 'agent',
      id: recipientId,
    }],
    context: {
      task_id: taskId,
      task_type: taskType,
      validation_result: validationResult,
      permissions_check: permissionsResult,
    },
    occurred_at: new Date().toISOString(),
  },
});

When something goes wrong in a multi-agent system, the first question is always "what happened and in what order?" The audit trail needs to answer that across agent boundaries. A delegation chain that spans four agents and three handoffs should be reconstructable from a single task ID.

Securing AI agents and MCP servers with WorkOS

The delegation and multi-agent patterns in this article depend on each agent having its own authenticated identity and on every resource access being checked against the originating user's authorization. WorkOS provides that foundation: OAuth 2.1 for per-agent credentials, FGA for resource-scoped authorization checks at every delegation boundary, audit logging for reconstructing delegation chains, and native MCP server authentication.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more