Securing agentic apps: How to stop your AI agents from misusing their own tools
Your agent has access to a database, a file system, and an email sender. Each tool is legitimate. The misuse is in the combination.
Your agent has access to three tools: a database query tool, a file writer, and an email sender. Individually, each tool is fine. Each was built for a legitimate purpose, reviewed by your team, and deployed to a trusted MCP server.
Now imagine a crafted support ticket that contains hidden instructions. The agent reads the ticket, queries the database for all customer records matching a broad filter, writes the results to a temporary file, and emails that file to an external address. Every tool call was valid. Every argument matched the schema. Every invocation was authorized by the agent's role. And you just exfiltrated your customer database through a perfectly legitimate tool chain.
This is ASI02 from the OWASP Top 10 for Agentic Applications: tool misuse and exploitation. It's the risk that sits between identity (who can act) and supply chain (can you trust the tools). ASI02 asks a different question: what happens when a trusted agent calls trusted tools in ways nobody anticipated?
The previous articles in this series covered giving agents their own credentials and vetting the tools they depend on. If you've implemented those controls, your agent has scoped identity, your tools are verified, and your MCP servers are authenticated. ASI02 is the gap that remains: the agent is authorized to use each tool, but the sequence, arguments, and context of its tool calls can still cause harm.
Why authorization alone doesn't catch this
If you followed the ASI03 guide, your agent's tool invocations are checked against RBAC at the role level and FGA at the resource level. The support agent can call query_database with tickets:read permission. The tool checks pass. The call executes.
The problem is that authorization answers a binary question: is this agent allowed to call this tool on this resource? It doesn't answer qualitative questions about how the tool is being called:
- Are the arguments safe? The agent has
tickets:readpermission and calls the database tool. But the query isSELECT * FROM customers WHERE 1=1, which dumps the entire table rather than looking up a specific record. The permission check passed because the agent has read access. The query itself is the problem. - Is this sequence normal? The agent calls the database tool, then the file writer, then the email sender. Each call is individually authorized. But this specific sequence, in this order, constitutes data exfiltration. No single tool call is malicious. The chain is.
- Is the context appropriate? The agent calls a billing tool to look up invoice details. That's its job. But it's doing it because a manipulated document told it to, not because a customer asked. The tool call is legitimate. The reason behind it is not.
Authorization is necessary but not sufficient. You need a policy layer on top of authorization that evaluates tool calls in context: what arguments are being passed, what tools were called before this one, and whether the overall pattern matches expected behavior.
The three types of tool misuse
Tool misuse in agentic systems falls into three categories, each requiring different controls.
Dangerous arguments to legitimate tools
The tool is right. The permission check passes. The arguments are the problem.
A query_database tool that accepts arbitrary SQL is the most obvious example. The agent has read permission, but nothing prevents it from running SELECT * FROM customers instead of SELECT * FROM customers WHERE id = ?. A write_file tool that accepts any path can be directed to write outside the agent's workspace. An http_request tool that accepts any URL can be pointed at internal services (SSRF) or used to exfiltrate data to an external endpoint.
The fix is argument validation at the invocation boundary, before the call reaches the MCP server:
This is not glamorous work. It's writing validation rules for every tool your agent can call, one argument at a time. But it catches the most common class of tool misuse: the agent calling a permitted tool with arguments that exceed the intended scope.
Note the default-deny pattern. If a tool isn't in the policy map, the call is blocked. New tools require explicit policy before the agent can use them.
Dangerous tool chains
Each tool call is individually fine. The sequence is the problem.
The database-then-file-then-email chain described in the opening is the classic example. But the pattern shows up in subtler ways. An agent that reads a customer's support history, then queries their billing records, then generates a summary might be doing exactly its job. Or it might be assembling a dossier in response to a prompt injection hidden in one of the support tickets.
Tool chain analysis requires tracking the sequence of calls within a session and flagging patterns that match known-dangerous combinations:
The key design decision here is what happens when a dangerous chain is detected. Blocking outright is appropriate for clearly dangerous patterns (data exfiltration). Requiring human approval is appropriate for ambiguous patterns that might be legitimate in context (broad read followed by email). Logging without blocking is appropriate for patterns you want to monitor but haven't yet confirmed as dangerous.
In practice, you start with logging. Run the chain monitor in audit mode for a few weeks, review the flagged patterns, and then decide which ones warrant blocking versus approval versus continued monitoring.
Emergent misuse from multi-step reasoning
This is the hardest category because it can't be caught by static rules. The agent is given a complex task, reasons through multiple steps, and arrives at a tool usage pattern that no human would have anticipated or approved.
An agent asked to "reduce our cloud costs" might discover it has access to a deployment tool and start terminating production instances. An agent asked to "clean up the customer database" might interpret "clean up" as deleting records rather than deduplicating them. The tools are used correctly from a technical standpoint. The agent's interpretation of the goal is wrong.
This class of misuse requires a combination of controls:
- Action classification. Categorize tool calls by their reversibility. Read operations are low risk. Write operations that can be rolled back are medium risk. Destructive operations (deletes, deployments, financial transactions) are high risk. High-risk operations should require explicit human approval regardless of the agent's authorization level.
- Plan validation. For complex multi-step tasks, require the agent to declare its plan before executing. The orchestrator reviews the plan against the original task scope and rejects plans that include tools outside the expected set.
- Circuit breakers. Set hard limits on the number of tool calls per session, the rate of calls per minute, and the number of high-risk calls per task. If the agent hits a circuit breaker, execution halts and a human reviews what happened.
Putting the layers together
The complete invocation policy stack has four layers, evaluated in order on every tool call:
- Authentication. Is this agent who it claims to be? (Covered in the credentials guide.)
- Authorization. Does this agent's role permit this tool, and does FGA permit this resource? (Also covered in the credentials guide.)
- Argument validation. Are the specific arguments to this tool call within the permitted bounds?
- Chain and context analysis. Does this call, combined with recent calls in the session, match a dangerous pattern? Does the risk level of this call require human approval?
The middleware that enforces all four layers on an MCP server looks like this:
Every denied call is logged with the specific reason. Every permitted call is logged with the full context. This feeds the audit trail described in the credentials guide and provides the data you need to tune your policies over time.
Building and tuning your policies
Nobody gets tool policies right on the first try. The process is iterative:
- Start in audit mode. Deploy the argument validation, chain analysis, and circuit breakers with enforcement turned off. Log everything. Let it run for a week or two against real traffic.
- Review the logs. Look at which argument patterns are most common, which tool chains appear frequently, and where the circuit breakers would have triggered. Identify false positives (legitimate calls that would have been blocked) and true positives (calls that should have been blocked).
- Write the first set of policies. Start with the highest-risk tools: anything that writes data, sends communications externally, or modifies permissions. Write argument validation for those first. Add chain patterns for the sequences you identified in the logs as clearly dangerous.
- Enforce incrementally. Turn on enforcement for the highest-risk policies first. Monitor for false positives. Adjust. Then expand to medium-risk tools. Then add chain analysis enforcement.
- Keep tuning. Your agent's usage patterns will change as you add new tools, new features, and new customer workflows. Review the audit logs monthly. Update policies when new tools are added. Reassess chain patterns when the agent's capabilities expand.
The goal is not to prevent the agent from doing its job. It's to ensure that the agent does its job within boundaries that a human has reviewed and approved. Least agency, not least capability.
What comes next
Across three guides, we've built a layered defense:
- Identity and authorization (Give your AI agents their own credentials): who can act, what are they authorized to do, and can you prove it later.
- Supply chain verification (How to vet the tools your AI agents depend on): can you trust the tools your agents connect to, and how do you detect when they change.
- Invocation policy (this article): when a trusted agent calls a trusted tool, are the arguments safe, is the sequence expected, and does the risk level require a human in the loop.
These three layers work together. Identity scoping limits who can act. Supply chain verification limits which tools they can call. Invocation policy limits how they call them. Skip any one and the other two can't fully compensate.
If you're building agentic applications and want to implement these controls without building auth infrastructure from scratch, WorkOS provides the identity layer: OAuth 2.1 for agent credentials, RBAC and FGA for tool-level and resource-level authorization, audit logging for every invocation decision, and native MCP server authentication. The invocation policy layer described in this article sits on top of that foundation, enforcing argument and chain rules at the middleware level using the identity and authorization signals WorkOS provides.