In this article

March 18, 2026

Model Routing vs Tool Routing: How to give your AI agents superpowers

Everyone thinks AI routing means swapping models. The bigger game is tool routing — giving your agent image gen, video, voice, and search via MCP and skills.

Zack Proser

March 18, 2026

Explore with AI

Open in ChatGPT

Open in Claude

Open in Perplexity

Everyone thinks "routing" in AI means swapping models. You spin up OpenRouter, point your agent at Claude for hard problems and DeepSeek for cheap ones, and call it a day.

That's model routing, and it matters. But there's a second kind of routing that transforms what your agents can do: tool routing.

Once you understand the distinction, you'll never look at your AI setup the same way.

The two kinds of routing

Model routing is about which LLM brain powers your agent. Text goes in, text comes out. You swap between Opus for complex reasoning, MiniMax for fast chat, DeepSeek for budget-friendly tasks.

The variables you're optimizing are cost, speed, and intelligence of the thinking layer. This is what OpenRouter and Claude Code Router do — they let you pick the right model for the job.

Tool routing is about what capabilities your agent can invoke. This is how a text-based agent suddenly generates images via Gemini, transcribes audio via Whisper, creates videos via Replicate, or publishes content to your CMS via Webflow.

The agent's LLM brain orchestrates, but the tools do the specialized work. This happens through MCP servers, skills, and API calls.

Here's why this matters: model routing changes how well your agent thinks. Tool routing changes what your agent can do. And in practice, the latter is far more powerful.

Model routing in practice

Setting up model routing is straightforward. If you're using Claude Code with OpenRouter, it's a single environment variable:

export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"

Now you can route your requests through any model OpenRouter supports — hundreds of them, from frontier models to fine-tuned specialists. Claude Code Router takes this further by letting you select different models for different task types within the same session.

The economics are real. You might run Opus for architectural decisions and code review, then drop to a cheaper model for boilerplate generation and simple edits. I've seen setups where mixing models cuts costs by 10-17x without meaningful quality loss on routine tasks.

But at the end of the day, model routing is still text in, text out. You're optimizing the brain, not the body.

Tool routing in practice

Tool routing is where things get interesting. This is how you give your agent hands.

In my OpenClaw setup, MiniMax handles the chat layer via OpenRouter — that's model routing. But the agent also has skills for email sweep, blog drafting, image generation, and calendar management. That's tool routing, and it's what makes the agent actually useful beyond conversation.

With Claude Code, you can do both simultaneously. Route through OpenRouter for the LLM layer, and install skills for specialized capabilities:

Each of these is a tool the agent can invoke. The LLM doesn't need to be multimodal itself. It just needs to be smart enough to know when and how to call the right tool.

The mechanism is MCP (Model Context Protocol) servers and skills installed in your .claude/skills/ directory. Each skill is a capability boundary — it defines what the agent can do, what parameters it needs, and how to call the underlying API.

The ecosystem explosion

The number of specialized services like Replicate, Together.ai, Arcade.dev and more that you can wire into your agents is growing fast. Here's what the landscape looks like right now:

Image generation: Gemini, DALL-E, Flux, Stable Diffusion — each with different strengths for photorealism, illustration, speed, and cost.

Video generation: Runway, Kling, Sora — text-to-video and image-to-video, rapidly improving in quality.

Voice and audio: Whisper for transcription, ElevenLabs and OpenAI TTS for speech synthesis. Your agent can listen and talk.

Code execution: E2B and Modal give your agent sandboxed environments to run code, test hypotheses, and return results.

Search and retrieval: Tavily, Exa, and Brave Search let your agent pull in real-time information from the web.

Infrastructure and publishing: Webflow, Linear, GitHub — your agent can manage projects, publish content, and ship code.

Every one of these is a tool an agent can call. You don't need a single model that does everything. You need a smart orchestrator with access to good tools. This is the same pattern that made Unix powerful — small, sharp tools composed together.

How to set this up

Here's the practical setup, layer by layer:

1. Model routing. Pick your LLM provider and configure the base URL. If you want model flexibility, use OpenRouter. If you want to stay in the Anthropic ecosystem with per-task model selection, use Claude Code Router. Either way, this is one environment variable or config change.

2. Tool routing. This is where you invest time. Install skills into .claude/skills/ for capabilities you use regularly. Connect MCP servers for services that need persistent connections. Add API keys for specialized services like image generation, search, and voice.

3. Agent teams. For complex workflows, set up a lead agent that coordinates teammates. Each agent in the team has access to the full tool suite. The lead agent breaks down the work, delegates to teammates, and assembles the results.

Agent teams are currently triggered by telling Claude Code you want it to spawn an agent team to accomplish a given task.

The compounding effect is real. Each new tool you add doesn't just add one capability — it adds every combination of that tool with your existing tools. Image generation plus CMS publishing means your agent can create and publish illustrated articles. Search plus code execution means your agent can research a topic and build a working prototype.

Where agent teams fit (and the current limitation)

Agent teams are the natural extension of tool routing. Instead of one agent juggling everything, you have specialized teammates — one handling research, another generating visuals, a third managing the publish pipeline.

Note that in this article, I'm speaking about the Anthropic-specific agent team capability recently added to Claude Code.

There's a practical limitation worth noting: agent teams currently all run the same model (Opus). The community has requested per-agent model selection — imagine routing your research agent through a cheap, fast model while your reasoning agent runs on Opus.

That's coming, but it's not here yet. For now, model routing is per-session, not per-agent. Tool routing, on the other hand, is already unlimited. Every agent in the team can call any tool you've configured.

The punchline

The most capable AI setup isn't the one running the most expensive model. It's the one with a good-enough orchestrator and excellent tools.

A $0.25/M token model with access to 20 specialized APIs beats a $15/M token model with no tools. Every time. The model provides judgment and orchestration. The tools provide capability.

Most people are optimizing the brain when they should be building the body. Model routing matters, but tool routing is the multiplier. Start wiring up your tools.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more