In this article

April 22, 2026

Best practices for AI agent access control

Identity, authorization, and oversight patterns for systems that act on their own.

Maria Paktiti

April 22, 2026

Explore with AI

Open in ChatGPT

Open in Claude

Open in Perplexity

AI agents introduce an access control problem that traditional identity and access management was not designed for. A conventional application acts on inputs and returns outputs within a narrow, predictable surface. An agent, by contrast, interprets natural language goals, chooses which tools to invoke, chains those calls together, and can operate autonomously for extended periods. The authorization decisions embedded in that loop are dynamic, the principal is probabilistic, and the blast radius of a single misstep can be large. What follows are the practices that hold up under contact with real users and real adversaries.

TL;DR

Give every agent its own identity. Agents should authenticate as distinct principals, not by reusing user sessions or long-lived API keys.
Enforce least privilege with fine-grained, capability-based scopes. Narrow tokens to specific verbs on specific resources, and put policy-enforcing proxies in front of APIs whose native scopes are too coarse.
Use short-lived credentials and rotate aggressively. Prefer minute-scale token lifetimes and just-in-time issuance over long-lived secrets, and track every cache where a credential might persist.
Make authorization decisions context-aware, and express policies as code. Evaluate identity, resource, action, and runtime attributes together (ABAC-style), and keep policies version-controlled and testable.
Separate user authority from agent authority, and defend against confused deputy attacks. The agent's effective authority should never exceed the intersection of what the agent and the user are each permitted to do.
Require out-of-band human approval for high-impact and irreversible actions. The approval channel must be one the agent cannot forge from its own context.
Treat every tool input and output as untrusted. Retrieved content can inform the agent's reasoning but must not, on its own, authorize a tool call.
Log everything, and make the log legible. Capture provenance, policy decisions, and the context that led to each step; feed signals into your SIEM.
Apply rate limits, quotas, and circuit breakers. Cap damage from runaway loops at per-agent, per-session, per-tool, and per-resource levels, and keep kill switches ready.
Isolate execution environments. Sandbox code execution, browsing, and file editing with their own identities, filesystems, and egress allowlists.
Plan for revocation, rotation, and deprovisioning from day one. Exercise the revocation path before you need it.

The rest of this article elaborates on each of these, in order.

Give the agent its own identity

The first and most foundational decision is to issue the agent a distinct principal in your identity provider, separate from any human user it serves. Agents should not authenticate by reusing a user's session cookie, OAuth token, or long-lived API key. They should have their own service identity, with its own credentials, its own audit trail, and its own lifecycle.

This matters for several reasons. First, it lets you attribute actions accurately. When an incident happens, you need to know whether a write was issued by Alice the human or by the agent operating on Alice's request, because the remediation paths differ. Second, it lets you apply different policies to agent traffic than to human traffic; you might, for instance, require stricter rate limits, forbid certain destructive operations entirely, or log at a higher verbosity. Third, it prevents credential theft from being a total compromise. If an agent has its own scoped credentials, stealing them does not give the attacker Alice's full access to every system Alice uses.

In practice, this usually means registering each agent (or each agent deployment) as a service account, workload identity, or OAuth client. When the agent acts on behalf of a user, use a delegation protocol such as OAuth 2.0 token exchange (RFC 8693) or an on-behalf-of flow, so that the resulting token carries both the agent's identity and the user's identity as separate claims. Downstream services can then evaluate each independently instead of collapsing them into a single principal. For details on how to do this, see our guide on giving AI agents their own credentials.

Enforce least privilege with fine-grained, capability-based scopes

Least privilege is cited constantly and implemented rarely. The typical failure mode is familiar: an agent is given an API token that works for the demo, the scope is never tightened, and six months later that token can do everything in the account. For agents this is especially dangerous because their behavior is probabilistic; they may attempt actions the designer never anticipated, and if the token allows them, they will succeed.

The target is capability-based authorization: the agent holds tokens that grant specific verbs on specific resources, not broad role memberships. Instead of repo:write across the entire GitHub organization, you want a token that can open pull requests in one repository and nothing else. Instead of mail.send, you want a token that can send from one mailbox to a pre-approved domain list. The closer the scope matches the task, the smaller the attack surface becomes.

Fine-grained scoping is tedious to implement because most APIs were not designed with this granularity in mind. When the underlying API only offers coarse scopes, put a policy-enforcing proxy in front of it and give the agent credentials to the proxy rather than to the API directly. The proxy becomes the place where you can also log every call, apply rate limits, and require approvals. Over time, the accumulation of narrow proxies becomes a coherent internal authorization plane for agent traffic, which is far more valuable than any individual policy rule.

Use short-lived credentials and rotate aggressively

Long-lived secrets are the single most common root cause of serious agent compromises. A key that was pasted into a prompt once, or checked into a repo, or cached in a log, or stored in an unencrypted tool definition, becomes a permanent liability. The mitigation is to stop issuing long-lived secrets at all where you can.

Prefer credentials with lifetimes measured in minutes. Workload identity federation, short-lived OIDC tokens, and STS-style assume-role flows let the agent obtain a fresh token for each task or each session, with no static secret ever living on disk. This is the just-in-time access pattern, and it applies just as well to agent principals as it does to human operators: permissions are minted on demand, scoped to the task at hand, and retired automatically when the task completes. When the agent must call a third-party API that only accepts long-lived keys, store those keys in a secrets manager that the agent queries at runtime, and rotate them on a schedule short enough that a leaked key stops working before an attacker can exploit it. Track every place a credential could be cached (agent memory, tool results, conversation transcripts, crash dumps, error reports sent to vendors) and ensure rotation actually propagates to all of them. A rotated key that still lives in a forgotten log is not rotated.

Make authorization decisions context-aware, and express policies as code

A scoped token tells you what an agent is permitted to attempt. Context-aware authorization decides whether each specific attempt should succeed given everything else the system knows at the moment of the call. The two are complementary, and the second is usually underdeveloped in early agent deployments. Even a narrowly scoped token is a static artifact: the richer runtime context (the user the agent is acting for, the resource being touched, the time of day, the origin of the request, the classification of the data involved, the trust level of the content that led to this step, the rate at which similar calls have been made) should all factor into the decision to allow or deny.

This is the idea behind attribute-based access control (ABAC) and its policy-based variants. Instead of embedding authority in the token and treating presentation of the token as sufficient, you express policies as rules that are evaluated at runtime: "this agent, acting for this user, may read records of this classification during business hours, from this network, provided the last N calls have not tripped any anomaly detector, and provided no untrusted content in the current context requested this action." The rules live in a dedicated policy engine (OPA, Cedar, or an equivalent), and every sensitive call queries the engine before executing. This gives you a single place to reason about authorization rather than a set of implicit checks scattered across tools.

Express those policies as code. Policy-as-code is not a framework preference; it is how you make authorization changes reviewable, reversible, and reproducible. Policies that live only in dashboards drift, cannot be diffed, and leave you with no mechanism to catch a regression before it reaches production. Check policy sources into version control, write unit tests against them, run them in CI against representative decision traces, and deploy them through the same pipeline as the rest of your infrastructure. When the scope layer, the policy engine, and the provenance layer all enforce a consistent model, you have something resembling a real control plane for agent traffic rather than a collection of ad hoc checks.

Separate user authority from agent authority, and resist confused deputy attacks

The confused deputy problem is the defining authorization failure mode for agents. It occurs when a privileged agent is tricked by a less privileged party, often through prompt injection or a malicious document, into using its privileges on behalf of the attacker. The agent is not compromised in the classical sense; it is doing exactly what it was built to do, but with inputs it should not have trusted.

The defense has two parts. First, the agent's effective authority for any action should never exceed the intersection of what the agent itself is permitted to do and what the requesting user is permitted to do. If Alice cannot delete the production database, the agent should not be able to delete it on Alice's behalf, even if the agent's own service credentials technically allow it. Implementing this correctly requires authorization checks that consider both principals, which is why on-behalf-of token flows matter: they let the downstream service evaluate both identities in the same decision.

This principle has a specific and often-missed consequence in retrieval-augmented generation. If an agent retrieves documents using its own credentials and filters them only after the model has generated a response, the agent has effectively given the user access to documents the user was never supposed to see, because the model can surface their contents through summarization, paraphrase, or inference even without quoting them. The authorization check must happen at retrieval time, before documents enter the context window, not at the output step. In practice this means the retriever evaluates each candidate document against the requesting user's permissions, and documents that fail the check never reach the model at all.

Second, data that enters the agent's context from untrusted sources (retrieved documents, web pages, emails, tool outputs) must not be treated as instructions that expand the agent's authority. It is fine for a document to inform the agent's reasoning; it is not fine for a document to cause the agent to call send_email with contents chosen by the document. This is an authorization question, not only a prompt engineering question. Policy should encode which tools can be invoked in response to which input provenance, and sensitive tools should require a user confirmation step whenever the triggering context includes untrusted content.

Require human approval for high-impact and irreversible actions

Not every action warrants the same level of autonomy. Reading a file, drafting a message, and querying an analytics database are cheap to undo or have no side effects. Sending the message, deploying the build, wiring the funds, deleting the records, and granting another principal access are not. For that second class of actions, human approval in the loop is a primary control, not a courtesy.

The approval step must be out of band with respect to the agent's context. If an attacker has injected instructions into the agent's working memory, a confirmation prompt that the agent itself renders and answers provides no security at all. Approval should be surfaced through a channel the agent cannot forge: a separate UI, a push notification, a chat message to the authenticated user, or a signed request that the user approves in their own session. The approving system, not the agent, holds the final authorization decision.

Design the approval taxonomy deliberately. Classify tools and actions by reversibility, blast radius, and sensitivity of data touched, and encode which categories require which level of human oversight. Avoid a binary of "autonomous" versus "always asks." The former inevitably expands until something bad happens, and the latter trains users to click approve reflexively until the confirmation step is security theater.

Treat every tool input and output as untrusted

The agent's loop mixes trusted inputs (the user's direct instructions, the system prompt, your policy) with a great deal of untrusted material (web content, document contents, API responses, data belonging to other users). From an access control standpoint, these streams must be isolated. The rule is simple to state: data from untrusted sources can inform the agent's answer, but it cannot by itself authorize an action.

Concretely, this means the authorization layer should know the provenance of the context that led to each tool call. If the agent decides to call a sensitive tool after reading an attacker-controlled document, that is a signal to require extra confirmation or refuse outright. Markers such as "the user has already approved this" appearing inside retrieved content should be disregarded; approvals come from the user's authenticated channel, not from strings in the context window.

Symmetrically, outputs returned from tools should be sanitized before being shown to the user or fed back into the agent's reasoning, and they should never be allowed to exfiltrate data through their content. Returning a URL that the agent then dutifully fetches is a classic data exfiltration path. Restrict outbound network egress from the agent's execution environment to a strict allowlist, and treat any attempt to reach a destination outside that list as a policy violation worth logging and alerting on.

Log everything, and make the log legible

Agent transcripts are the only reliable way to reconstruct what happened after an incident. A sufficient log is more than a list of tool calls. For each step it should include the prompt or goal that triggered the step, the tool invoked, the arguments, the result, the identity under which the call was made, the policy decision and its reasoning, and the provenance of the context that led to the decision. Redact secrets and personal data at write time, not at read time, because logs get copied, mirrored, and shipped to vendors in ways you will not fully anticipate.

The value of the log compounds when it is queryable. You want to be able to answer questions like "which tool calls this week happened after the agent ingested content from outside the corporate domain" or "show me every approval prompt that was dismissed within two seconds." These patterns reveal emerging abuse and misconfigured workflows before they become incidents. Feed a subset of the log into your SIEM and build alerts for anomalous verb-on-resource combinations, unusual tool-call volumes, and sudden changes in the set of APIs an agent touches.

Apply rate limits, quotas, and circuit breakers

Agents can loop. A misbehaving planner can call the same tool thousands of times in a minute, either because of a logic error, a poisoned context, or an adversarial input. Rate limits and quotas are the bluntest and most reliable way to cap the damage from such runaway loops, and they should be applied at multiple layers: per agent instance, per user session, per tool, and per sensitive resource. A single global rate limit is rarely enough because it will be dominated by healthy traffic and miss a single abusive session.

Circuit breakers complement rate limits by reacting to error signals. If an agent's tool calls are being rejected at an unusual rate, or if it is being prompted in ways that consistently trigger policy violations, the right response is often to stop the session and escalate, not to retry. Build kill switches that can disable an agent, or a specific tool, globally without a code deploy, and make sure your on-call has both the access and the runbook to use them under pressure.

Isolate execution environments

If your agent executes code, renders untrusted HTML, runs shell commands, or browses the web, that activity should happen inside a sandbox with its own identity, its own filesystem, and its own network policy. The sandbox should not share credentials with the orchestrator, should not have access to secrets it does not strictly need, and should have egress restricted to the specific destinations the task requires. Ephemeral sandboxes that are destroyed between tasks are strictly better than long-lived ones, because they prevent state (including injected state) from persisting across sessions.

The principle extends beyond code execution. Browser agents should run in ephemeral profiles with no access to the user's cookies for other sites. File-editing agents should be given access to a working copy, not to the canonical store. Multi-tenant agent platforms should isolate tenants at an infrastructure level, not just with row-level filters in application code, because a prompt injection that bypasses application-level tenancy checks will otherwise reach another tenant's data. The working assumption is that the agent will eventually be tricked into trying to cross a boundary, and the only reliable protection is to make the boundary something stronger than a string check.

Plan for revocation, rotation, and deprovisioning from day one

Agents get retired, repurposed, forked, and leaked. The systems that grant them access must support fast, complete revocation, and you should exercise that path regularly rather than discovering at 2 AM that you cannot actually turn an agent off. Every credential the agent holds should have a documented revocation procedure and a tested recovery plan. Every integration should have an owner who is notified when credentials are rotated. When an agent deployment is decommissioned, its identity should be deprovisioned from every downstream system it ever touched, not just the ones anyone remembers.

A useful discipline is to periodically ask, for each agent in production, "if this agent were compromised right now, what is the complete list of systems I would need to contact, and how long would it take?" If the answer is uncertain or long, the access model needs tightening before the next agent is deployed, not after the next incident.

Building on proven primitives with WorkOS

Implementing the full stack above from scratch is possible, but rarely advisable. Most of the hardest problems (distinct agent identities, role assignment synced from enterprise IdPs, scoped and hierarchical authorization, fast session checks) are problems B2B SaaS teams have been solving for a decade. WorkOS packages them as primitives that extend cleanly onto agent principals rather than something you have to retrofit.

AuthKit handles authentication across the three shapes agent auth actually takes. For human users acting through an agent, it provides SSO, MFA, passkeys, SCIM directory sync, and IdP role assignment, so the agent has a real authenticated user identity to bind its delegation to. For agents authenticating as themselves with no user in the loop, AuthKit issues tokens via the OAuth 2.0 client credentials flow, which is the standard machine-to-machine pattern and the right primitive for headless, scheduled, or service-like agents. And when agent capabilities are exposed through MCP, AuthKit acts as a spec-compatible OAuth 2.1 authorization server for MCP servers, handling Dynamic Client Registration, PKCE, Protected Resource Metadata, and token issuance, so your MCP server only has to verify the JWTs it receives.

RBAC provides the coarse-grained layer. Roles and permissions are embedded in the access token for local checks without an API round trip, and IdP role assignment lets the same directory that governs your users govern agent authority too.

FGA (Fine-Grained Authorization) addresses the resource-scoped, context-aware problem that flat RBAC cannot express. It keeps the familiar role-and-permission model but attaches roles to nodes in a resource hierarchy, so permissions flow from a workspace down to its projects and branches automatically, without the combinatorial explosion of flat roles. Because authorization is evaluated against a deterministic resource graph rather than the agent's self-reported intent, an agent cannot talk its way into reaching a resource it was never granted. That makes FGA the layer that most directly answers the confused deputy problem in the agent context.

If you are shipping agent features and want an enterprise ready authorization foundation instead of a weekend of glue code, get started with WorkOS →

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more