The token bill is an identity problem

Organizations are discovering that AI agent costs are invisible by design. The fix starts earlier in the stack than most teams realize.

Maria Paktiti

June 22, 2026

Explore with AI

Open in ChatGPT

Open in Claude

Open in Perplexity

Uber deployed Claude Code to roughly 5,000 engineers in late 2025. By April 2026, the company had burned through its entire annual AI budget. Four months in. The CTO acknowledged being "back to the drawing board." Around the same time, a separate enterprise reportedly spent $500 million in a single month after deploying AI access without usage limits. A healthcare organization consumed one trillion tokens over six months and generated more than $6 million in unplanned costs before the finance team understood what was driving it.

These are not freak incidents. According to the FinOps Foundation's executive director, by April 2026, companies were describing existential crises: "We are 3x over our entire 2026 token budget and it's only April." The conversation, the Foundation reports, has shifted from "tokenmaxxing and go fast" to "we need guardrails."

The spend is real. The governance infrastructure to contain it does not yet exist. And the reason it does not exist is largely architectural.

Why token costs are invisible

Traditional enterprise software costs are predictable. Per-seat licenses, annual contracts, clearly defined tiers. Finance can model them, budget them, and report against them. Token consumption is structurally different in three ways that defeat every tool organizations already have.

The first is non-linearity. A chat assistant answers once. An agent reasons, calls a tool, reads the result, reasons again, and repeats. Gartner analysis puts agentic tasks at roughly 5 to 30 times the token consumption of equivalent chatbot interactions. A single user request can fan out into 10 or 20 model calls, with input tokens, not output, driving most of the bill.

The second is invisibility to finance. The established cost-management stack was built to attribute cloud infrastructure: instances, clusters, storage. None of it sees a token. It cannot tell you that the marketing team's retrieval pipeline spent more last week than the entire engineering organization, because the data never reaches it.

The third is non-attribution. Token consumption does not appear on a purchase order. It does not map cleanly to a cost center. It is generated continuously and autonomously, often across dozens of concurrent agents running in parallel across business units and workflows. Most enterprise agent rollouts exceed their pilot budget by 4 to 11 times within the first 90 days of broad deployment, driven almost entirely by retrieval breadth and uncapped tool-call recursion.

Put those three together and you have a metered consumption obligation with no meter.

The identity layer is the meter

The emerging FinOps response to this problem focuses on instrumentation: real-time consumption monitoring at the workload level, business-unit chargeback for token consumption, ROI thresholds gating new AI initiatives. These are the right practices. But they depend on a prerequisite that most teams have not yet built: the ability to attribute every model call to the agent that made it, the user who authorized it, and the tool it was calling.

Attribution is an identity problem before it is a FinOps problem.

Consider what a well-instrumented agentic system needs to answer:

Which agent made this call?
On whose behalf was it acting?
What tool was it calling, and was that call authorized?
How does this call relate to the task session it belongs to?

These are the same questions that a proper authorization layer answers as part of enforcing access control. An agent that has been issued its own credential, scoped to specific tools through fine-grained authorization, operating under a session-limited token, leaves an audit trail as a side effect of normal authorization mechanics. The cost attribution problem does not need a separate system. It needs the authorization system to be doing its job.

A two-column diagram showing three authorization decisions on the left and the cost attribution data each one produces on the right. Per-agent credential produces per-agent spend. FGA tool-level scope produces per-tool cost breakdown. Session-scoped token produces per-task unit cost. A footer reads: attribution is a side effect of authorization, no separate telemetry build required.

What "doing its job" looks like

Most deployed agents today do not have dedicated credentials. They authenticate as the user who configured them, or they carry a service account key that has been granted whatever access was easiest to set up. The resulting audit log attributes every action to a human identity or a generic service account. You cannot tell which agent made which call. You cannot cap an individual agent's consumption. You cannot decommission an agent's access without revoking the credential from every other agent using the same key.

Two audit log entries side by side. The before entry, labeled 'shared service account', shows the agent, user, and session fields marked as unknown. The after entry, labeled 'per-agent credential', shows all fields populated: agent is research-agent-42, user is m.chen@acme.com, session is task-2026-06-22-0941. Both entries show the same token count of 12,400. A caption reads: token count is identical — attribution is what changes.

This is not a cost-governance failure. It is an authorization architecture failure that also produces a cost-governance failure.

A system built for agents looks different. Each agent has its own identity, separate from the user it acts on behalf of. Authorization is granted at the tool level, not the service level: an agent scoped to read invoices in the billing project can access everything inside that scope and nothing outside it. Sessions are time-limited: when the task ends, access ends, and no orphaned token can be reused. When an agent is decommissioned, its relationship to every resource it could access is revoked simultaneously.

The audit trail produced by this architecture is exactly the instrumentation that cost governance requires. Every model call is associated with an agent identity. Every agent identity is associated with the tool permissions it held during that call. Every session has a clear start and end, which maps naturally to task-level cost attribution. The telemetry that FinOps teams need to build chargeback systems does not need to be layered on afterward: it is already there.

The MCP gap and what it means for cost

The Model Context Protocol does not address this at the protocol level. The MCP 2026 roadmap acknowledges that rate limiting and cost attribution are unsolved problems. Static client secrets remain common in production. Many teams run agents without the session-scoped access patterns that would make tool-level attribution possible.

This matters for cost governance in a direct way. Without tool-level scoping, you cannot cap what a specific agent can call. Without session-scoped tokens, you cannot define a task boundary that maps to a unit-cost measurement. Without per-agent credentials, you cannot do chargeback at the agent level, only at the model-API level, which tells you what you spent but not where it went.

The practical implication for teams building agentic systems now is to make the authorization architecture decisions early. The teams that will avoid the Uber problem are not the ones that add FinOps dashboards after the fact. They are the ones whose authorization layer was designed to produce the right data from the beginning.

What to instrument before you scale

Three instrumentation decisions have the highest impact on cost visibility, and all three are decisions you make at the authorization layer:

‍Agent identity. Issue distinct credentials to each agent, separate from human user credentials and from other agents. This is the single most important change. Without it, no downstream attribution system can work.‍
Tool-level scoping. Use fine-grained authorization to scope each agent to the specific tools it needs. An agent that can only call the tools it has been explicitly granted access to cannot generate token spend on anything else. The scope constraint is also the cost constraint.‍
Session boundaries. Design agent workflows around task sessions with explicit start and end points. Session-scoped tokens, which expire when the task completes, provide the natural boundary for per-task cost attribution. They also prevent the most common cause of runaway spend: an agent that continues to make calls against a persistent credential long after the user-visible task has completed.

These decisions compound. An agent with its own credential, scoped to specific tools, operating under a session-limited token, produces cost data as a structural property of its operation, not as an afterthought. The finance team's question, "what did that agent spend on that workflow?", becomes answerable from the authorization system's audit logs rather than requiring a separate telemetry build.

The governance stack is being built

The FinOps Foundation launched a tokenomics working group in mid-2026 to define canonical frameworks for token attribution and cost measurement. Gartner projects worldwide AI spending to grow roughly 47 percent this year, with agentic inference as the largest contributor. The governance infrastructure is assembling in real time.

The organizations that will scale agents without budget crises are not waiting for standards to settle. They are building the authorization layer now, before the adoption curve outruns their ability to see what they are running.

The token bill is high. The audit log is empty. That is an identity problem, and it is solvable.

Governing agent token spend with WorkOS

WorkOS provides the authorization primitives that make agent cost attribution possible. AuthKit issues per-agent OAuth 2.1 credentials with session-scoped tokens so every model call is tied to a distinct, verifiable agent identity rather than a shared service account. Fine-Grained Authorization scopes each agent to the specific tools it is authorized to call, so the access boundary and the cost boundary are the same thing: an agent that cannot call a tool cannot spend tokens on it. Audit Logs capture every agent action with the credential, tool, and session that produced it, giving finance and engineering teams the per-agent, per-workflow attribution data that chargeback systems require. The three instrumentation decisions described in this article sit directly on top of that foundation.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more