In this article

May 23, 2025

Flipping the flow: How MCP sampling lets servers ask the AI for help

Explore how MCP transforms server logic with AI-powered completions, human approvals, and transparent workflows.

Maria Paktiti

May 23, 2025

What if your backend could ask the AI for help, on its own terms, when it needs it?

That’s the promise of MCP sampling: a protocol-level mechanism that flips the usual flow of language model requests.

Instead of the client always initiating prompts to the server, MCP lets the server request a completion from the client, creating a more powerful and programmable pattern for using AI inside your systems.

It’s not just about sending a prompt. It’s about giving servers agency, bringing humans into the loop, and enabling structured, AI-assisted workflows that are transparent, controllable, and auditable.

What is MCP sampling?

In most traditional setups, a client (usually a browser or mobile app) sends a prompt to an API. The server acts as a middleman—or a dumb pipe.

With MCP sampling, that dynamic is reversed. The server initiates the request and says: “Hey, I need a model to help me complete this task”.

This request is routed to an MCP client—which might be an automated AI agent, a workflow orchestrator, or a human-in-the-loop interface. And here’s where it gets powerful: The user is always in the loop.

Before the request is sent to the model, the user (typically a human operator or reviewer) can:

See the full message that’s about to be sent.
Edit the prompt, correct mistakes, or fine-tune the tone.
Reject the request entirely, if something looks off.

After the model returns a completion, the same thing happens:

The user can review the output.
Edit or improve it, especially for quality or safety.
Or reject it before it’s passed back to the server.

This built-in review loop makes MCP Sampling ideal for use cases where accuracy, transparency, or human judgment is essential—like content moderation, customer support, or decision-making systems.

By flipping the traditional flow and adding two layers of human oversight, MCP Sampling opens the door to richer, safer, and more collaborative AI workflows.

Why flip the flow?

This change in control unlocks three major benefits:

Server-initiated reasoning: Servers can call on the model only when needed—like during a decision point in a workflow or when parsing ambiguous data.
Human review: MCP Sampling supports two levels of human-in-the-loop control: before the request is sent to the model, and after the model responds but before it's returned. Perfect for high-stakes outputs like legal summaries, financial calculations, or user-facing messages.
Structured, repeatable AI use: Each sampling request follows a structured protocol format, making it observable, traceable, and versionable—unlike traditional freeform prompts.

How sampling works in MCP

Here’s how MCP sampling works:

The MCP server decides that a model completion is needed to move forward in a task, make a decision, or complete a piece of content. The server sends a sampling/createMessage request to the client.
The server doesn't call the model directly. Instead, it sends a structured request to the client, which acts as the intermediary between the server, user, and model (showing the request to the user, forwarding it to the LLM, and routing back responses). This client can be an automated AI handler, a queue-based async system, or a human reviewer UI.
Before the request goes to the model, a human can preview and edit it. This is ideal for safety, correctness, and transparency. Once approved, the message is compiled and sent to the model using the standardized MCP message format.
The LLM returns a response based on the provided context, system prompt, preferences, and tokens. A human can inspect the completion before it's sent back to the server. If it looks good, it’s approved. If not, it can be flagged, edited, or resampled.
The server receives the approved response and continues its logic: using the completion to make a decision, update a record, send a message, etc.

This structured loop enables:

Automated workflows with manual checkpoints
Reproducible, auditable AI decisions
Human + AI collaboration on a per-task basis

MCP sampling isn’t just “prompting.” It’s programmable delegation—with the AI, server, and human all playing a role.

Example use case: Automated support triage

Imagine a customer support backend receives a new ticket:

The server detects that it’s unclear which team owns the issue.
It sends a sampling request to the MCP client: “Given this text, what category does this fall into?”
A human reviewer sees the message, approves it, and submits it to the LLM.
The model replies: “Billing-related issue”
The human approves the output, and the server routes the ticket accordingly.

No prompt engineering dashboards. No guesswork. Just structured, supervised AI-as-a-service, built into your system.

The request format

Each sampling request includes a standardized payload:

	
{
  "messages": [
    {
      "role": "user",
      "content": {
        "type": "text",
        "text": "Summarize the following article:\n{{article_body}}"
      }
    },
    {
      "role": "assistant",
      "content": {
        "type": "text",
        "text": "Sure, here’s the summary..."
      }
    }
  ],
  "systemPrompt": "You are a helpful assistant.",
  "modelPreferences": {
    "hints": [{ "name": "gpt-4" }],
    "costPriority": 0.5,
    "speedPriority": 0.3,
    "intelligencePriority": 0.9
  },
  "includeContext": "thisServer",
  "temperature": 0.7,
  "maxTokens": 512,
  "stopSequences": ["\n\n"],
  "metadata": {
    "experimentId": "summarization-v2"
  }
}

Messages

An array of messages in the familiar chat format. Each message includes:

role: One of "user", "assistant", or potentially "system"—used to guide the model's behavior.
content:
- type: "text" or "image".
- text: The actual string content (required for type "text").
- data: Base64-encoded image data (used with type: "image").
- mimeType: MIME type for the image (e.g., "image/png").

Use this field to structure conversations, few-shot examples, or multimodal interactions.

System prompt

A dedicated system-level instruction string (e.g., "You are a helpful assistant"), used by many models to set behavior or tone.Think of this as a global instruction that sits outside the turn-based message array.

Model preferences

A way to guide which model to use and how to balance trade-offs:

hints: Suggestions for which models or providers to prefer (e.g., "gpt-4" or "claude-3").
costPriority: Number from 0 to 1 indicating how important it is to minimize cost.
speedPriority: How important latency is to you.
intelligencePriority: Preference for model capability (e.g., reasoning, understanding).

Great for balancing cost vs. quality vs. latency in dynamic environments.

Include context

Specifies how much of the server-side or session context to include:

"none": Only the provided messages are sent.
"thisServer": Include context local to the current server.
"allServers": Include any federated/global session context.

Useful for multi-turn conversations or agent-like behavior.

Sampling parameters

temperature: A float (usually between 0.0 and 1.0) controlling randomness in the output. Lower values = more deterministic results. Set this low for consistent behavior or high for creative generation.
maxTokens: The maximum number of tokens to generate in the response. Controls the length and cost of output—important for budget or latency-sensitive applications.
stopSequences: Optional list of string sequences that will cause the model to stop generating further tokens. Useful for enforcing format constraints or ending at logical points (like "\n\n").
metadata: A freeform object for storing custom values—like:
- Experiment IDs
- User session IDs
- Timestamps or tags

When to use MCP sampling

MCP Sampling isn’t just a neat protocol feature—it’s a tool for designing AI systems that are structured, reviewable, and safe by design. Below are common and powerful use cases where MCP Sampling excels:

Structured data extraction: Imagine your backend receives a flood of customer feedback, complaint tickets, or scraped data. The information is rich but unstructured—buried in messy text. With MCP Sampling, your server can initiate a model request asking: “Can you extract the company name, issue type, and urgency?” Before the response is stored or used, a human can preview the structured output, edit it if necessary, and approve it.
Decision-making workflows: Sometimes, your system reaches a moment where logic isn’t clear-cut—maybe it’s deciding which department should handle a vague support ticket, or whether a user query fits into sales or technical support. Instead of hardcoding brittle logic trees, your server can pause, formulate a structured question for the model, and send a sampling request. The model makes a suggestion, a human reviews it, and the workflow moves forward with confidence. It's AI-enhanced decision-making, but still under your control.
Form completion: Let’s say you’re building an onboarding flow. You’ve collected half the data—a user’s role, company size, maybe a few goals—but some fields are still blank. Rather than blocking progress or forcing guesses, the server can generate a draft completion using the model: “Based on this profile, what onboarding plan should we recommend?” MCP lets you preview and approve the result, helping fill in gaps with intelligence instead of guesswork.
Human-reviewed pipelines: In high-stakes domains—support automation, user messaging, moderation—you often need a human in the loop. But that doesn't mean you want manual labor at every step. MCP Sampling gives you the best of both worlds: servers can request completions automatically, and users can review or refine both the prompt and the response before anything is finalized. It’s a scalable way to combine AI speed with human judgment, without bottlenecks or blind trust.
AI-assisted testing: When testing your application logic, edge cases and unpredictable behavior are always lurking. MCP Sampling lets your system generate hypothetical inputs or probe edge scenarios by asking the model for ideas or responses: “What unusual ways might a user phrase this question?” The server handles the sampling; the user can vet the completions. This turns the model into a creative testing companion—one that helps you build smarter, more resilient systems.

Final thoughts

MCP Sampling isn’t just a way to generate completions—it’s a shift in how AI interacts with systems.

It gives servers the power to think and ask.
It gives humans the ability to observe and approve.
And it gives teams the ability to build AI flows that are scalable, transparent, and accountable.

If you’ve ever wished your system could “ask the AI a question”—on its own, at the right time, with oversight built in—this is how. Whether you're extracting insights, making decisions, filling gaps, or stress-testing workflows, MCP gives you a way to do it all with structure, control, and trust.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more