In this article

March 31, 2026

The architecture of governable AI agents: Constrain first, observe always

How to design AI agents that do less, prove more, and stay within boundaries your security team can actually audit.

Maria Paktiti

March 31, 2026

Explore with AI

Open in ChatGPT

Open in Claude

Open in Perplexity

AI agents aren't chatbots anymore. They schedule meetings, move money, provision infrastructure, and execute multi-step workflows across production systems. They do this autonomously, often without a human reviewing every action before it fires.

That autonomy is the point. It's also the problem. The OWASP Top 10 for Agentic Applications catalogs the most critical risks facing these systems in production, from goal hijacking to rogue agents. But before those ten risks, the framework foregrounds two foundational design principles: least agency and strong observability.

Every risk in the Top 10 traces back to a failure in one or both. These are the architectural decisions that determine whether your agent is governable at all. This article is about what it actually takes to implement them.

Least agency is not least privilege

If you've worked in enterprise security, you've encountered the principle of least privilege: give users the minimum permissions required to do their job. It's a good principle. It's also insufficient for agentic systems.

Least privilege answers one question: what can this entity access? Least agency answers a harder one: how much freedom does this entity have to act on that access without checking back?

The distinction matters because agents don't just read data and return results. They plan. They chain actions together. They make decisions about what to do next based on what happened in the previous step. An agent with read-only access to a customer database and write access to an email tool has very limited privilege. But if it can autonomously decide which customers to query, compose messages based on those queries, and send them without approval, it has enormous agency.

Least privilege would say the permissions are fine. Least agency would say the autonomy is not.

Agency has dimensions, not just levels

When people first hear "least agency" they tend to think of it as a dial you turn down. Less autonomy, more human oversight. That's part of it, but the concept is more nuanced than a single slider. Agency decomposes into at least four dimensions that you need to reason about independently.

‍Scope of action. Which tools can the agent invoke, and with what parameters? This is closest to traditional least privilege, but extends beyond it. It's not just whether the agent can call the billing API. It's whether it can call it with arbitrary customer IDs, or only the ID of the customer in the current session.‍
Depth of planning. How many steps can the agent chain together before requiring a checkpoint? A single-step agent that retrieves a document is fundamentally different from a multi-step agent that retrieves a document, extracts financial data, updates a spreadsheet, and sends a summary to the CFO. Both might use the same tools. The risk profile is completely different.‍
Breadth of delegation. Can the agent invoke other agents? Can those sub-agents invoke their own sub-agents? Delegation chains create transitive trust relationships that are notoriously hard to reason about. If Agent A trusts Agent B, and Agent B trusts Agent C, does Agent A trust Agent C? In most current implementations, the answer is implicitly yes, and nobody designed it that way.‍
Reversibility of actions. Some actions can be undone. A draft email can be deleted. A database query returns results but doesn't change state. Other actions are irreversible: sending an email, executing a payment, deleting a production resource. The required level of oversight should scale with irreversibility.

These dimensions interact. A narrow-scope agent with shallow planning depth and only reversible actions might safely operate with full autonomy. A broad-scope agent with deep planning, delegation authority, and access to irreversible actions needs checkpoints at every stage. The mistake most teams make is treating agency as uniform when it's actually a matrix.

Designing for least agency in practice

Least agency isn't something you bolt on after building your agent. It's an architectural decision that shapes how you design tool interfaces, plan approval workflows, and structure your agent's interaction with the rest of your system.

‍Decompose broad tools into narrow ones. If your agent has a "database tool" that can run arbitrary SQL, you've given it maximum agency over your data layer. Instead, expose specific query functions: get_customer_by_id, list_recent_orders, search_knowledge_base. Each function has a defined input schema, a predictable output shape, and a clear authorization boundary. The agent can't construct a query you didn't anticipate because it doesn't have a query-construction tool.‍
Make approval gates a first-class concept. Humans in the loop aren't a fallback for when things go wrong. They're a design pattern for managing agency. Define which actions require approval, who can approve them, and what happens when approval is denied or times out. This should be part of your agent's workflow definition, not an afterthought handled by a Slack notification that someone might notice.‍
Earn autonomy through demonstrated reliability. A new agent should start with minimal autonomy: every action requires approval, every plan is reviewed. As the agent demonstrates consistent, correct behavior in a specific domain, you can incrementally expand its autonomy for that domain. This isn't just a nice idea. It's how you build the behavioral baselines that make observability useful (more on that below).‍
Scope credentials to the task, not the agent. When an agent needs to perform an action, it should receive credentials scoped to exactly that action, with a time-bound expiration. If the agent's job is to update a support ticket, it gets a token that can update that specific ticket, and the token expires when the task completes. This is where OAuth 2.1 with sender-constrained tokens becomes essential: the token is bound to the specific client that requested it and can't be replayed by a compromised component.‍
Enforce boundaries on delegation. If Agent A delegates to Agent B, Agent B's permissions should be a subset of Agent A's, and Agent A's should be a subset of the authorizing user's. This is the principle of permission attenuation, and it should be enforced by your authorization layer, not by hoping that Agent B respects a natural-language instruction to "only access workspace data."

Strong observability is not logging

Every production system has logging. You write events to a file or a log aggregator, and when something goes wrong, you search through them to figure out what happened. That's useful but it's not what the OWASP framework means by strong observability.

Strong observability for agentic systems means maintaining a continuous, queryable understanding of three things: what the agent is doing, why it's doing it, and on whose authority. The "why" is what separates observability from logging. A log tells you the agent called the billing API at 14:32:07. Observability tells you the agent called the billing API because step 3 of its plan required retrieving invoice data for the customer referenced in the support ticket that triggered the workflow, and the action was authorized by the user session that initiated the ticket triage.

The three layers of agentic observability

Useful observability for agents operates at three distinct layers, each answering different questions.

‍Decision-layer observability captures the agent's reasoning process. What plan did the agent construct? What alternatives did it consider? What information influenced its decision to take action A instead of action B? This layer is the hardest to implement because LLM reasoning isn't deterministic or easily introspectable, but it's also the most valuable for detecting goal hijacking (ASI01) and understanding cascading failures (ASI08). At minimum, you should be logging the agent's plan at each step, including the full context that informed it.‍
Action-layer observability captures what the agent actually did. Every tool invocation, every API call, every inter-agent message. This is closest to traditional logging, but with a critical addition: each action must be linked to the decision that triggered it and the identity that authorized it. An action log entry should answer: which tool was called, with what arguments, by which agent, on behalf of which user, as part of which plan step, and what was the result.‍
Identity-layer observability captures the authorization chain behind every action. Who is the human user that initiated this workflow? What are their permissions? What scoped credentials did the agent receive? Did any delegation occur, and if so, what permissions were attenuated? This layer is essential for compliance and audit, but it also serves a security function: if you can trace every action to a specific identity chain, you can detect when an agent is operating outside its authorized scope.

Why the two principles are inseparable

The OWASP framework presents least agency and strong observability as complementary, and it's worth being explicit about why neither works without the other.

Least agency without observability is blind constraint. You've limited what the agent can do, but you have no way to verify that the limits are working, no way to detect when they fail, and no data to inform decisions about when to expand or contract the agent's autonomy. You're flying on faith that your permission model matches reality.

Observability without least agency is surveillance of a system you haven't bothered to constrain. You can see everything the agent is doing, but it has broad permissions to do almost anything, so your observability data is a firehose of legitimate actions that makes anomaly detection nearly impossible. When everything is permitted, nothing looks anomalous.

The two principles create a feedback loop. Least agency reduces the agent's action space, which makes observability tractable. Observability provides the data you need to tune the agent's agency, tightening constraints where you see unnecessary actions, relaxing them where the agent consistently operates correctly within bounds. Without this loop, you're either over-constraining your agents (killing their utility) or under-constraining them (accepting unmanaged risk).

Building observability that scales

The practical challenge of agentic observability is volume. A single agent workflow might generate dozens of tool invocations, each with its own authorization check, input parameters, and output. A multi-agent system can generate hundreds of events per minute. If your observability system can't handle that volume in a way that remains queryable and actionable, it's not observability. It's a write-only log.

‍Structure your events for querying, not just storage. Every event should carry a consistent set of fields: agent ID, user ID, session ID, workflow ID, step number, tool name, authorization decision, timestamp, and outcome. If you can't filter your event stream to show "all actions taken by Agent X on behalf of User Y in Workflow Z," your structure needs work.‍
Correlate across layers. A tool invocation event (action layer) should link to the plan step that triggered it (decision layer) and the authorization check that permitted it (identity layer). Without these links, you're looking at three separate streams of events with no way to reconstruct the full picture.‍
Define behavioral baselines. This is where least agency feeds back into observability. If you know that your support agent typically invokes 3-5 tools per ticket, reads 1-2 customer records, and never accesses billing data, those patterns become your baseline. An invocation of the billing API, or a session with 30 tool calls, triggers an alert not because the action was unauthorized, but because it deviates from the established pattern.‍
Make observability actionable, not just archival. Real-time monitoring should trigger circuit breakers when anomalies are detected. If your support agent suddenly starts querying customer records in bulk, you don't want to discover that in a weekly audit. You want the system to pause the agent, flag the session, and require human review before resuming.

How every OWASP risk maps back to these principles

The ten risks in the OWASP Top 10 for Agentic Applications aren't isolated vulnerabilities. Each one represents a specific failure mode where least agency, strong observability, or both were absent.

‍ASI01 (Goal Hijacking) is what happens when an agent has enough agency to redefine its own objectives based on untrusted input, and there's no observability into its planning process to detect the shift.‍
ASI02 (Tool Misuse) is what happens when tools are scoped too broadly (agency failure) and tool chain sequences aren't monitored for anomalous patterns (observability failure).‍
ASI03 (Identity and Privilege Abuse) is a direct agency failure: agents inherit overly broad credentials instead of operating with scoped, time-bound identity.‍
ASI04 (Supply Chain Vulnerabilities) is what happens when the trust boundary around external tools and dependencies isn't constrained (agency) and integrity isn't verified at load time (observability).‍
ASI05 (Unexpected Code Execution) is maximum agency (the ability to generate and execute arbitrary code) without corresponding observability into what that code does.‍
ASI06 (Memory Poisoning) is what happens when agents have unconstrained write access to their own memory (agency) and memory mutations aren't validated or audited (observability).‍
ASI07 (Insecure Inter-Agent Communication) is what happens when delegation is unrestricted (agency) and messages between agents aren't authenticated or logged (observability).‍
ASI08 (Cascading Failures) is what happens when agents can chain unlimited steps without checkpoints (agency) and error propagation isn't monitored (observability).‍
ASI09 (Human-Agent Trust Exploitation) is what happens when agents have the agency to present manipulated information with the same authority as verified information, and there's no observability mechanism for users to check provenance.‍
ASI10 (Rogue Agents) is the terminal failure: an agent with enough agency to cause harm and insufficient observability to detect that it's been compromised.

The pattern is consistent. Constrain the agent's freedom to act (least agency), maintain continuous visibility into what it does and why (strong observability), and use each principle to reinforce the other.

Implementing the principles with identity infrastructure

Both principles converge on identity. Least agency requires an authorization layer that can express fine-grained, resource-scoped, time-bound permissions. Strong observability requires an audit layer that can attribute every action to a verified identity chain. You can't implement either without answering the fundamental question: who is this agent, and what is it allowed to do right now?

Agent identity as a first-class concept

Agents need their own managed identities, separate from the users who invoke them. This isn't just a security nicety. It's a prerequisite for both principles.

For least agency, separate identity means you can assign the agent permissions that are narrower than the user's. A user might have access to their entire workspace; the agent they invoke for a specific task should only access the resources relevant to that task. OAuth 2.1 with the client credentials flow gives each agent its own client ID and scoped tokens, so the agent never borrows the user's session. If the agent is compromised, you revoke the agent's credentials without affecting the user.

For strong observability, separate identity means every action in your audit trail has two attributions: the agent that performed it and the user who authorized the workflow. Without this separation, your audit log shows "User X accessed billing data" when the reality is "Agent Y accessed billing data on behalf of User X as part of Workflow Z." The difference matters for incident response, compliance, and understanding what actually happened.

Authorization at every trust boundary

Role-based access control gives you the first layer: the agent's role determines which categories of actions it can perform. A "support-agent" role can read tickets and write internal notes. A "data-analyst-agent" role can run read-only queries. Roles are checked on every tool invocation by inspecting the token claims, so the enforcement happens at the tool level, not just the agent level.

But RBAC alone can't express resource-scoped constraints. Can this agent access this specific workspace's data? Can it modify this particular project? Fine-grained authorization (FGA) extends RBAC by scoping roles to specific resources in a hierarchy. You define resource types (organizations, workspaces, projects, tools), create resources at runtime with parent-child relationships, and assign roles on specific resources. Permissions inherit downward through the hierarchy, and the tenant boundary is structural, not a query filter your code remembers to apply.

This combination (RBAC for action categories, FGA for resource scope) is how you implement least agency at the authorization layer. The agent's role limits what types of actions it can perform. The resource scope limits which specific resources it can perform them on. Both are time-bound through token expiration. Both are revocable. Both generate audit events.

Audit as the observability backbone

Every authorization check, whether granted or denied, becomes an audit event. Every tool invocation carries the agent's identity, the user's identity, the workflow context, and the authorization decision. This is the identity-layer observability described above, and it comes for free when your authorization infrastructure is designed to produce it.

The audit trail also closes the loop on least agency. If you see that your support agent never invokes the billing tool, you can remove billing permissions from its role with confidence. If you see it occasionally needs to read a customer's payment status, you can add a narrow, read-only billing permission scoped to the current customer. Your observability data informs your agency decisions, and your agency constraints keep your observability data manageable. The flywheel turns.

Start with constraints, expand with evidence

If you're building agentic applications and aren't sure where to begin, the ordering matters: start with least agency, add observability, then use observability data to tune agency over time.

‍Audit your agent's current capabilities. List every tool, API, and system your agent can reach. For each one, ask: does the agent need this for its defined task? If not, remove access. For the ones it does need, ask: does it need write access or just read? Does it need access to all records or just specific ones? Narrow everything.‍
Give agents their own identity. Stop letting agents borrow user sessions. Issue scoped, time-bound credentials through OAuth 2.1. Use client credentials for machine-to-machine communication and authorization code with PKCE for user-authorized actions.‍
Enforce authorization at the tool level. Every tool invocation should check the agent's role and resource scope before executing. Use RBAC for action categories and FGA for resource-level control.‍
Log everything with proper attribution. Every action gets an audit event that links the agent identity, user identity, workflow context, and authorization decision.‍
Build behavioral baselines from production data. Once you have observability running, you have the data to define what "normal" looks like for each agent type. Deviations from that baseline trigger alerts and, for high-risk anomalies, automatic circuit breakers.‍
Expand autonomy incrementally. When an agent has consistently operated correctly within its current constraints for a meaningful period, consider relaxing specific boundaries. Each expansion should be logged and reversible.

The principles are clear. The harder question is how you implement them without building an entire authorization and audit infrastructure from scratch.

WorkOS for governable AI agents

The patterns described throughout this article (scoped identity, layered authorization, structured audit trails) map directly to what WorkOS is built to provide on the authorization side.

For least agency, WorkOS RBAC and Fine-Grained Authorization work together to enforce constraints at every tool invocation. RBAC controls what categories of actions an agent can perform. FGA scopes those permissions to specific resources in a hierarchy, so an agent authorized on one workspace structurally cannot reach another. Together they answer the two questions that least agency demands: "what type of action?" and "on which specific resource?"

If you're already using WorkOS AuthKit for authentication, adding agent identity and tool-level authorization is incremental. AuthKit handles the OAuth 2.1 flows, giving each agent its own scoped, time-bound credentials. RBAC and FGA enforce permissions at the tool level. The OWASP Top 10 for Agentic Applications documents what goes wrong when these principles are absent. WorkOS gives you the infrastructure to implement least agency without building it yourself.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more