In this article

May 30, 2025

How MCP servers work: Components, logic, and architecture

A behind-the-scenes look at the core components of an MCP server — from request handling and session orchestration to caching and context stores.

Maria Paktiti

May 30, 2025

Modern AI assistants are only as powerful as the data and tools they can access.

The Model Context Protocol (MCP) is an open standard introduced by Anthropic to bridge AI models with data sources and services. In simple terms, MCP defines a common client–server architecture where AI applications (the clients) connect to MCP servers that expose data, tools, or other capabilities in a standardized way.

Instead of writing custom integrations for every database, API, or enterprise system, developers can rely on MCP’s uniform interface, much like using a universal “AI USB port” to plug an AI into any data source.

In this article, we will take a tour through the architecture of an MCP server, explaining each of its major components and the role they play in connecting AI to valuable context.

Overview of MCP architecture

At a high level, MCP involves three roles: the host, the client, and the server:‍

The host is the AI-powered application (for example, a chat assistant or AI-enhanced IDE) with which the end-user interacts.
Within the host lives the client, which handles the MCP protocol on the app’s side and maintains a dedicated connection to a server.
The server is a separate program or service that provides specific capabilities (access to a database, a web search tool, a file system, etc.) via the standardized MCP interface.

This is how these three roles interact in practice:

The host interprets user requests and decides which server’s capability is needed. For example, if a user asks “How many customers signed up today?”, the host’s client might route this to an analytics MCP server.
The client sends a request using JSON-RPC to the appropriate server.
The server executes the action and returns the results to the client. For example, it queries a database and returns the number.
The client passes the answer to the host, which uses it to formulate a final answer for the user.

This division of labor keeps the AI model focused on reasoning and language, while the MCP server focuses on action and data retrieval.

It’s a clean separation: the host+client handles user interaction and AI reasoning, and the server handles tool execution and data access.

With the big picture in mind, let’s zoom in on the MCP server itself. An MCP server’s architecture can be broken down into several key components:

Communication layer
Request handlers
Context stores
Session orchestrators
Caching layers

Below we explore each component and its role in making MCP tick.

Communication layer

At the heart of every MCP server is the communication layer that speaks the MCP protocol. Under the hood, MCP uses JSON-RPC 2.0 as the message format for all exchanges.

!!JSON-RPC is a lightweight and stateless protocol for remote procedure calls (RPC) that uses JSON to encode messages. It facilitates communication between a client and a server, enabling the client to execute methods on the server as if calling local functions!!

When an MCP client first connects to a server, there is an initial handshake where the server advertises what it can do. The server sends back its protocol version and a list of supported capabilities (more on capabilities shortly). This capability negotiation is built into the protocol – the client learns what resources, tools, or prompts the server offers and can adjust its usage accordingly. The protocol logic ensures both sides agree on message formats and feature sets before doing any heavy lifting.

MCP servers maintain a stateful connection with the client over the session. Whether the transport is a local pipe (STDIO), an HTTP stream, or a WebSocket, the connection stays open, allowing a series of back-and-forth interactions with shared context. This is how the server can carry context from one request to the next.

For example, if the server just returned page 1 of a large document, the next request for page 2 can be understood in context. Under the hood, the transport layer might vary – STDIO for local tools or Server-Sent Events/HTTP for remote servers – but all transports ultimately ferry JSON-RPC messages in both directions. The communication component of the server abstracts these details, so the rest of the server logic can deal with high-level requests and not worry whether the bits arrived via a pipe or a network socket.

In summary, the protocol handling component is the MCP server’s front door and telephone line: it listens for incoming JSON-RPC calls, translates them into internal function calls, and sends back JSON-RPC responses. It upholds the MCP specification (e.g., formatting messages properly, managing IDs for request-response pairs, handling errors) so that the server and client stay in sync.

Request handlers

An MCP server isn’t very useful until it actually does something in response to requests. That’s where request handlers come in. These are the functions or methods within the server that execute specific actions when a corresponding MCP request is received. In essence, each capability the server provides is backed by a request handler that implements it.

MCP defines three types of server capabilities:

Resources: These are endpoints for information retrieval. A resource handler might return data from a database, fetch a document, or list items in cloud storage. Importantly, resources are read-only or passive – they provide data but typically do not cause side effects. For example, an MCP server for a knowledge base might have a resource called searchArticles that takes a query and returns relevant articles from a corpus. The server’s request handler for searchArticles would handle the JSON-RPC request by executing the search and formatting the results to send back. Resources often need to handle large data (files, big query results), so their handlers might support streaming chunks or pagination for efficiency.
Tools: These are active operations that can perform side effects or computations. A tool handler might create a new record in a database, send an email, invoke an external API, or even control something like a web browser. Tools enable the AI to act on the world (within limits). For instance, an MCP server could expose a tool sendSlackMessage that, given a channel and message, posts to Slack. The request handler for this tool would contain the logic to call Slack’s API and return a success/failure result. Because tools can have effects, their handlers usually include validations and safety checks. They also define input parameters and output schema clearly, so the AI (and client) know how to use them.
Prompts: These are a bit unique – they are reusable prompt templates or workflows that the server can provide to guide interactions with the AI model. Essentially, a prompt capability might supply a pre-defined prompt or chain-of-thought that the AI can use. For example, a server could have a prompt called sqlQueryTemplate that helps an AI format a database query request consistently. The request handler for a prompt might not hit an external system at all, but instead return a carefully crafted template or even initiate a multi-step workflow involving the model (some advanced MCP servers use prompt capabilities to orchestrate complex interactions). Prompt handlers ensure the template variables are filled and the final prompt is delivered to the AI client.

Each handler is tied to a specific method name (like "tool.sendSlackMessage"). When a request comes in, the server’s protocol layer routes it to the right handler. That function runs, returns a result (or an error), and the server sends that back to the client.

MCP servers often include a built-in handler for a capability discovery request – a way for the client to ask “what can you do?”. This might be an initialization step where the server returns a list of all available tools, resources, and prompts (sometimes including schemas or examples).

Request handlers are implemented in whatever language the server is written in (Python, JS, etc.), using SDKs or frameworks that MCP provides to simplify this mapping.

Example: Bookmark manager

Let’s see a fresh example of an MCP server that manages a simple bookmark manager. It shows how to create, and retrieve bookmarks, and also provides a prompt for saving links meaningfully.

  
from mcp.server.fastmcp import FastMCP
from mcp.server.stdio import stdio_server
from mcp.server import InitializationOptions, NotificationOptions
from pathlib import Path
import json
import asyncio

BOOKMARKS_FILE = Path("bookmarks.json")
if not BOOKMARKS_FILE.exists():
    BOOKMARKS_FILE.write_text("[]", encoding="utf-8")

app = FastMCP("MCP Bookmarks Server")

# Utility to load/save bookmarks
def load_bookmarks():
    return json.loads(BOOKMARKS_FILE.read_text(encoding="utf-8"))

def save_bookmarks(data):
    BOOKMARKS_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")

#### TOOLS ####
@app.tool()
def add_bookmark(title: str, url: str, tags: list[str] = []) -> str:
    bookmarks = load_bookmarks()
    bookmarks.append({"title": title.strip(), "url": url.strip(), "tags": tags})
    save_bookmarks(bookmarks)
    return f"Bookmark '{title}' saved."

#### RESOURCES ####
@app.resource("bookmark://{index}")
def get_bookmark(index: int) -> dict:
    bookmarks = load_bookmarks()
    if 0 <= index < len(bookmarks):
        return bookmarks[index]
    return {"error": "Bookmark not found."}

#### PROMPTS ####
@app.prompt()
def suggest_bookmark_title(topic: str) -> str:
    return f"Suggest a clear, catchy title for a bookmark about: {topic}"

# -----------------------------
# ENTRYPOINT (Explicit Transport)
# -----------------------------
async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(
            read_stream,
            write_stream,
            InitializationOptions(
                server_name="mcp-bookmarks-server",
                server_version="0.1.0",
                capabilities=app.get_capabilities(
                    notification_options=NotificationOptions(),
                    experimental_capabilities={},
                ),
            ),
        )

if __name__ == "__main__":
    asyncio.run(main())

Tool: add_bookmark lets you store a new bookmark with optional tags.
Resource: get_bookmark retrieves a bookmark by index.
Prompt: suggest_bookmark_title helps the AI generate good titles.

Context stores

A key strength of MCP servers is their ability to remember things between requests. To do this, they use a context store—a place to keep data that’s needed across multiple calls or sessions.

A context store is any internal storage the server uses to hold state or memory. It could be:

In-memory storage (like a Python dictionary) for small, fast tasks—e.g., saving a user’s to-do list.
Databases (like Postgres or MongoDB) for more complex, persistent data—e.g., customer records or project tasks.
Vector databases for storing document embeddings and enabling semantic search.
External services (like Google Drive or Salesforce), where the server fetches data on demand and may cache it temporarily.

The context store allows the server to keep track of session history or intermediate results, respond faster , and support smarter, more context-aware tools—like remembering earlier steps in a conversation or already-loaded files. For example, a browser automation server might store cookies and open tabs in memory while handling a multi-step request. Or a coding assistant could cache files from a repo, so it doesn’t reload them for every question.

For prototypes, simple in-memory or file-based storage is fine. But for production use with many users or lots of data, scalable systems like Redis, SQL, or vector stores are needed. Regardless of the backend, a good context store uses clean interfaces so the rest of the server can access or update data easily.

Session orchestration

In an MCP setup, the client (usually running on your computer or device) often handles the main logic of a session. But the server also helps manage sessions, especially when it needs to keep track of what’s happening with each client.

Each connection to the server is like a session. During that session, the server might remember certain details about the client or task. This is managed by a special part of the server called the Session Orchestrator.

But why (and when) does an MCP server needs to orchestrate a session?

Imagine an MCP server that automates browsing the web. A client asks it to "find the cheapest flight from X to Y next month." That’s not a simple, one-step request. The server might:

Open a browser
Go to a travel website
Fill out a form
Look through multiple pages of results

All these actions happen in order, and the server needs to remember things like which pages are open or which filters are selected. That’s where the Session Orchestrator comes in—it manages everything during this multi-step process.

Even simpler tasks may benefit from session handling. For example:

A form that asks questions step by step can remember your progress.
A shopping assistant can keep track of what items you’ve added to your cart.
A server might let a client start a session, do several actions, then end it—keeping everything organized and separate from other clients.

The Session Orchestrator is responsible for:

Maintaining session data: Keeps temporary info like variables, user input, or results.
Linking requests: Makes sure each step knows what happened in the previous one.
Handling timeouts: Closes sessions and cleans up if the client disconnects or goes idle.
Managing shared tasks: In rare cases, multiple clients might work on something together (like a shared whiteboard). The orchestrator can handle this too.

Caching layer

MCP servers often deal with large or slow data sources, like big databases or rate-limited APIs. To stay fast and efficient, they use caching—a way to temporarily store frequently-used or expensive-to-fetch data.

Some common caching strategies that MCP servers use are:

In-Memory caching: Stores recent results in memory for quick reuse. For example, stock prices can be cached for a few seconds to avoid repeated API calls.
Persistent caching: Uses tools like Redis or databases to store data across sessions or server instances—great for large data or distributed systems.
Multi-level caching: Combines fast memory with longer-term storage. The server checks fast caches first, then slower ones if needed.
Prefetching: Loads data in advance based on what the server expects the client will request soon—like fetching the next pages of a document while the user reads.

Caching can dramatically speed up performance and reduce costs, especially for AI or API-heavy tasks, but requires careful thought about invalidation – when to refresh or remove cached data.

From a developer’s perspective, the caching layer might be implemented via existing libraries or services. For instance, a Python-based MCP server could use an in-memory dictionary for simple caching, or functools.lru_cache for memoization of function calls. For more complex needs, it might connect to a Redis instance or use a caching decorator that handles key generation based on request params.

Conclusion

MCP servers make AI integrations smarter and more structured. Each part of the server has a clear role—request handlers perform actions, the context store gives memory, session management keeps interactions connected, and caching boosts performance.

MCP captures a much-needed mental model for how modern LLM applications should be constructed. As the ecosystem matures, expect MCP to serve as a foundational layer — much like Dockerfile or OpenAPI has done in their respective domains.

If you're building with LLMs and care about maintainability, composability, and control — it's time to take MCP seriously.

We’re hiring

Our global team is growing and we’re hiring all types of roles.

View open roles

About us

WorkOS builds developer tools for quickly adding enterprise features to applications.

Learn more