MCP Server Patterns for Tool-Augmented LLMs

The Model Context Protocol (MCP) has become the standard interface for giving LLMs access to external tools and data sources. Instead of building custom tool-calling integrations for each model provider, you build an MCP server once and any compatible client can use it.

But "build an MCP server" glosses over the design decisions that determine whether your server is reliable, secure, and fast. Here are the patterns that work.

MCP Architecture in 30 Seconds

An MCP server exposes three primitives:

Tools: Functions the LLM can call (execute query, create file, send message)
Resources: Data the LLM can read (file contents, database schemas, API docs)
Prompts: Reusable prompt templates with parameters

Tool Design: The Most Important Decision

The quality of your tools determines the quality of LLM interactions. A poorly designed tool confuses the model and produces bad tool calls.

Rule 1: One tool, one action. Don't build a database tool that accepts an action parameter. Build query_database, insert_record, update_record, and delete_record separately. The model selects tools by name and description — distinct tools with clear names get called correctly far more often.

Rule 2: Descriptions are prompts. The tool description is injected into the model's context. Write it like you're explaining the tool to a competent developer who has never seen your system:

python

Rule 3: Return structured data, not natural language. When a tool returns results, return JSON or a consistent text format. The model processes structured output more reliably than free-form text. Include metadata like row counts, truncation indicators, and error details.

python

Transport Selection

MCP supports multiple transport mechanisms. The choice depends on your deployment model.

stdio — The server runs as a subprocess, communicating via stdin/stdout. Simplest setup, lowest latency, but limited to local execution. This is what Claude Code and most IDE integrations use.

python

SSE (Server-Sent Events) — HTTP-based transport. The client connects via HTTP, sends requests as POST, and receives responses as SSE events. Works across networks, easy to deploy behind a reverse proxy.

python

Streamable HTTP — The newest transport option. Single HTTP endpoint that supports bidirectional streaming. Better for stateless deployments and serverless functions.

Transport	Latency	Deployment	Statefulness	Best For
stdio	Lowest	Local only	Stateful	IDE plugins, CLI tools
SSE	Low	Network	Stateful	Internal services
Streamable HTTP	Low	Network/Serverless	Stateless option	Cloud deployments, multi-tenant

Authentication Patterns

For remote MCP servers, you need authentication. The protocol supports OAuth 2.0 out of the box, but the implementation patterns vary.

Pattern 1: Token passthrough. The MCP client includes a bearer token in the initial connection. The server validates it against your auth service. Simple, works for single-tenant deployments.

python

Pattern 2: OAuth 2.0 with PKCE. For multi-tenant servers where users authenticate via a browser flow. The MCP client initiates the OAuth flow, the user authenticates in a browser, and the client receives a token.

Pattern 3: API key with scoping. Each client gets an API key that maps to a set of allowed tools and resources. Simple to implement, good for service-to-service communication.

python

Resource Management

Resources expose data that the LLM can read without executing a function. Think of them as a filesystem-like interface to your data.

python

Pattern: Lazy loading with caching. Don't load all resources at startup. Load them on first access and cache with a TTL. For large datasets (API docs, codebase indexes), use a resource template that loads specific sections on demand.

Pattern: Resource subscriptions. If your resources change (live database schema, updated docs), implement notifications so the client knows to re-fetch:

python

Error Handling and Timeouts

MCP tool calls can fail in ways that the LLM needs to understand. Don't just throw exceptions — return error information the model can reason about.

python

The hint field matters. When the model sees a structured error with a hint, it can self-correct and retry with a modified tool call. Without it, the model often repeats the same failing call or gives up.

Composing Multiple MCP Servers

In production, you'll have multiple MCP servers — one for database access, one for file management, one for external APIs. The MCP client connects to all of them simultaneously, and the model sees a unified tool palette.

Keep servers focused on a single domain. A database server shouldn't also manage files. This makes each server simpler to test, deploy, and secure independently.

Name tools to avoid collisions across servers. Prefix tool names with the domain: db_query, db_insert, files_read, files_write, github_create_issue. The model uses these prefixes to understand which domain a tool belongs to, improving tool selection accuracy.

MCP servers are the interface layer between LLMs and your systems. Treat them like APIs — design clear contracts, validate inputs, handle errors gracefully, and version them. The time you invest in tool design pays back every time the model makes a correct tool call instead of hallucinating an action.

MCP Server Patterns for Tool-Augmented LLMs

MCP Server Patterns for Tool-Augmented LLMs

MCP Architecture in 30 Seconds

Tool Design: The Most Important Decision

Transport Selection

We build this end-to-end in the cohort.

Authentication Patterns

Resource Management

Error Handling and Timeouts

Composing Multiple MCP Servers

More in AI Engineering

Building Reliable LLM Evaluation Pipelines

Prompt Caching Strategies That Cut Your LLM Costs in Half

become an engineering leader