MCP Performance Optimization Guide (2025)

Every millisecond of MCP latency adds to AI response time. Users waiting 10+ seconds for a response will abandon your tool. This guide covers the patterns that make MCP servers fast.

Why Performance Matters

Slow MCP servers create a poor user experience. When an AI assistant calls your tool and waits 5 seconds for a response, the entire conversation feels sluggish. Fast servers = better UX = more usage.

Async Everything

MCP is inherently async. Don't block the event loop:

# BAD - blocks the event loop
@server.tool()
def slow_tool():
    result = requests.get("https://api.example.com")  # Blocking!
    return result.json()

# GOOD - non-blocking
@server.tool()
async def fast_tool():
    async with aiohttp.ClientSession() as session:
        async with session.get("https://api.example.com") as response:
            return await response.json()

Connection Pooling

Create connections once, reuse them:

import aiohttp
import asyncpg

class MCPServer:
    def __init__(self):
        self.session = None
        self.db_pool = None
    
    async def startup(self):
        # HTTP connection pool
        self.session = aiohttp.ClientSession(
            connector=aiohttp.TCPConnector(limit=100)
        )
        # Database connection pool
        self.db_pool = await asyncpg.create_pool(
            DATABASE_URL, 
            min_size=5, 
            max_size=20
        )
    
    async def shutdown(self):
        await self.session.close()
        await self.db_pool.close()

Caching Strategies

In-Memory Cache

For frequently accessed, rarely changing data:

from functools import lru_cache
from cachetools import TTLCache

# Simple LRU cache
@lru_cache(maxsize=1000)
def get_config(key):
    return load_from_database(key)

# Async TTL cache
cache = TTLCache(maxsize=1000, ttl=300)  # 5 minute TTL

async def cached_fetch(url):
    if url in cache:
        return cache[url]
    
    result = await fetch(url)
    cache[url] = result
    return result

Redis for Distributed Caching

When running multiple MCP server instances:

import redis.asyncio as redis

class CachedMCPServer:
    def __init__(self):
        self.redis = redis.Redis(host='localhost', port=6379)
    
    async def get_cached(self, key, fetch_func, ttl=300):
        # Try cache first
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)
        
        # Cache miss - fetch and store
        result = await fetch_func()
        await self.redis.setex(key, ttl, json.dumps(result))
        return result

Batch Operations

Combine multiple requests into one:

@server.tool()
async def get_users_batch(user_ids: list[str]):
    # BAD: N database queries
    # users = [await db.get_user(id) for id in user_ids]
    
    # GOOD: 1 database query
    users = await db.get_users_where_id_in(user_ids)
    return users

Streaming Responses

For large results, stream instead of buffering:

@server.tool()
async def stream_large_file(path: str):
    async def generate():
        async with aiofiles.open(path, 'r') as f:
            async for line in f:
                yield line
    
    return StreamingResponse(generate())

Timeout Handling

Don't let slow operations hang forever:

import asyncio

@server.tool()
async def fetch_with_timeout(url: str):
    try:
        return await asyncio.wait_for(
            fetch(url),
            timeout=5.0  # 5 second timeout
        )
    except asyncio.TimeoutError:
        return {"error": "Request timed out"}

Benchmarking

Measure before optimizing:

import time
import statistics

async def benchmark_tool(tool_func, iterations=100):
    times = []
    for _ in range(iterations):
        start = time.perf_counter()
        await tool_func()
        times.append(time.perf_counter() - start)
    
    return {
        "mean": statistics.mean(times) * 1000,  # ms
        "median": statistics.median(times) * 1000,
        "p95": sorted(times)[int(iterations * 0.95)] * 1000,
        "min": min(times) * 1000,
        "max": max(times) * 1000,
    }

Target Metrics

Metric

Target

Acceptable

Tool latency (p50)

<100ms

<500ms

Tool latency (p95)

<500ms

<2s

Memory per request

<10MB

<50MB

Connections reused

>90%

>70%

Performance Checklist

☐ All I/O operations are async
☐ Connection pools for HTTP and database
☐ Caching for repeated queries (TTL appropriate)
☐ Batch operations where possible
☐ Timeouts on all external calls
☐ Streaming for large responses
☐ Profiled and benchmarked critical paths

Next Steps

→ Testing MCP Servers — Load testing approaches
→ MCP Error Handling Patterns — Graceful degradation
→ MCP Guide Home — All tutorials

Get updates in your inbox

Tutorials, updates, and best practices for Model Context Protocol.

No spam. Unsubscribe anytime.

Written by Kai Gritun. Building tools for AI developers.