TL;DR — Model Context Protocol (MCP) is the standard way to expose tools, resources, and data to AI agents. It hit 97 million installs by mid-2026 and is supported by Anthropic, OpenAI, Google, Cursor, and most major AI tools. Most tutorials stop at the hello-world echo server. This is the production guide — auth, tool registration, streaming, error handling, and observability — from someone who runs MCP servers in production at Modelia.
What Is MCP, In One Paragraph
Model Context Protocol is a JSON-RPC-based protocol that lets AI agents (Claude, GPT, Gemini, etc.) discover and call tools, read resources, and execute prompts exposed by external servers. The protocol is transport-agnostic — stdio, SSE, WebSocket, and now HTTP-streamable — and is the same on the client side regardless of which model is driving the conversation. If you have ever used Cursor's "@" file references, Claude Code's plugin marketplace, or any IDE-integrated AI tool that reads your codebase, you have already used MCP under the hood.
Why MCP Matters
Before MCP, every AI agent had its own bespoke tool-calling format. OpenAI's function calling, Anthropic's tool use, Google's function declarations — same idea, different schemas, different transports, different observability. If you built a tool for one agent, you rebuilt it for every other agent.
MCP collapses that. Write the tool once, expose it as an MCP server, every agent can call it. That is why the install graph went exponential through 2025–2026.

For the broader context on agentic systems and how MCP fits in, see building agentic AI systems and Claude agentic workflows in production.
The MCP Object Model
Four primitives. Learn these and the protocol falls open:
- ›Tools — functions the agent can call. Each has a name, JSON-schema input, and returns content.
- ›Resources — data the agent can read by URI. Files, database rows, API responses.
- ›Prompts — parametrized prompt templates the server exposes to the client.
- ›Sampling — the server asks the agent to run an LLM completion (rare, but useful for nested agents).
A production MCP server typically ships 5–30 tools, 2–10 resource handlers, and 1–3 prompts. Anything more is usually a sign the server should be split.
Building a Production MCP Server in TypeScript
Here is the skeleton I actually use. Stdio transport (works with Claude Desktop and Claude Code), tool registration, structured error handling.
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import {
CallToolRequestSchema,
ListToolsRequestSchema,
} from "@modelcontextprotocol/sdk/types.js";
import { z } from "zod";
const server = new Server(
{ name: "modelia-image-tools", version: "1.0.0" },
{ capabilities: { tools: {} } }
);
// 1. List tools the agent can call
server.setRequestHandler(ListToolsRequestSchema, async () => ({
tools: [
{
name: "generate_image",
description: "Generate an AI image from a prompt. Returns a CDN URL.",
inputSchema: {
type: "object",
properties: {
prompt: { type: "string", minLength: 3 },
aspect: { type: "string", enum: ["1:1", "4:5", "16:9"] },
},
required: ["prompt"],
},
},
],
}));
// 2. Handle tool invocations
const GenerateImageInput = z.object({
prompt: z.string().min(3),
aspect: z.enum(["1:1", "4:5", "16:9"]).default("1:1"),
});
server.setRequestHandler(CallToolRequestSchema, async (req) => {
if (req.params.name !== "generate_image") {
throw new Error(`Unknown tool: ${req.params.name}`);
}
const input = GenerateImageInput.parse(req.params.arguments);
// ... call your image-gen pipeline here
const imageUrl = await runImageGenPipeline(input);
return {
content: [
{ type: "text", text: `Generated: ${imageUrl}` },
{ type: "image", data: imageUrl, mimeType: "image/png" },
],
};
});
// 3. Start the transport
const transport = new StdioServerTransport();
await server.connect(transport);That is a real, working MCP server. Drop it in your mcp.json and Claude Desktop can call it.
Production Concerns That Tutorials Skip
Auth
Stdio MCP servers inherit the parent process's credentials. SSE/HTTP servers need explicit auth. The pattern I use: the MCP server reads a per-tenant API key from an X-Tenant-Token header on the SSE handshake, then issues short-lived JWTs scoped to that tenant for downstream calls.
import express from "express";
import jwt from "jsonwebtoken";
const app = express();
app.get("/mcp", async (req, res) => {
const token = req.header("X-Tenant-Token");
const tenant = await verifyTenantToken(token);
if (!tenant) return res.status(401).end();
// Issue a scoped session JWT the MCP handlers can read
const sessionJwt = jwt.sign({ tenant: tenant.id }, process.env.MCP_SIGNING_KEY!, {
expiresIn: "1h",
});
// Hand off to MCP SSE transport...
});Streaming Long Tool Calls
Image generation, video rendering, long DB queries — anything > 5 seconds — should stream progress. The MCP SDK supports it via progressToken and notifications/progress.
// Inside a tool handler — emit progress
await server.notification({
method: "notifications/progress",
params: {
progressToken: req.params._meta?.progressToken,
progress: 0.42,
total: 1.0,
},
});The agent UI surfaces this as a live progress indicator. Without it, your tool feels broken for the 30 seconds it actually takes.
Error Handling
Tools fail. Wrap every handler in a try/catch and return structured errors as content, not exceptions. Exceptions look like server crashes to the agent; structured errors let the agent reason and retry.
try {
const result = await doTheThing(input);
return { content: [{ type: "text", text: result }] };
} catch (err) {
return {
isError: true,
content: [
{
type: "text",
text: `Tool failed: ${(err as Error).message}. Retry with a smaller batch.`,
},
],
};
}The isError: true field is the contract the agent uses to know retry is worthwhile.
Observability
Every tool invocation should emit a span with: tool name, tenant, input hash, latency, success/failure, output token count. I send these to OpenTelemetry + Langfuse. Without observability you cannot debug agent loops in production.
For the broader observability and error-recovery patterns that live around MCP servers, see agentic AI error recovery and observability.
How to Test an MCP Server
The MCP Inspector is the right starting point:
npx @modelcontextprotocol/inspector node dist/server.jsIt opens a web UI where you can list tools, call them with arbitrary inputs, and see the raw protocol traffic. This is your unit-test harness during development.
For automated tests, the MCP SDK ships a client you can spawn against your own server:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
const transport = new StdioClientTransport({
command: "node",
args: ["dist/server.js"],
});
const client = new Client({ name: "test", version: "1.0.0" }, { capabilities: {} });
await client.connect(transport);
const result = await client.callTool({
name: "generate_image",
arguments: { prompt: "neon city" },
});Wire that into your CI as a smoke test for every PR.
Common MCP Pitfalls
- ›Don't return giant blobs. Resource content > 1MB will get truncated by the client. Use a URL or paginate.
- ›Don't trust agent-provided input. Validate with Zod / Pydantic on every handler. Agents will hallucinate arguments.
- ›Don't share state across tool calls without isolation. Per-tenant scope at minimum. Per-request when possible.
- ›Don't put your MCP server behind a 60-second proxy. Use SSE / HTTP-streamable and stream early.
- ›Don't skip the inspector. It catches schema bugs before users do.
Bottom Line
MCP is the dial-tone for agent tools in 2026. If you build for Claude, GPT, Gemini, or anything in between, write your tools once as an MCP server and let every agent call them. The protocol is small enough to learn in an afternoon. The hard part — and the part most tutorials skip — is the auth, streaming, error handling, and observability that turn a hello-world MCP server into a production one.
If you build one and want feedback on it, find me on LinkedIn. I'd rather review one good MCP server than ten more tutorials.
Frequently Asked Questions
What is Model Context Protocol (MCP)?
MCP is a JSON-RPC-based protocol that lets AI agents discover and call tools, read resources, and execute prompt templates exposed by external servers. It is transport-agnostic (stdio, SSE, WebSocket, HTTP-streamable) and works with Claude, GPT, Gemini, Cursor, and most major AI tools.
Why has MCP grown so fast?
Before MCP, every AI agent had its own bespoke tool-calling format. MCP collapses that — write the tool once as an MCP server and every agent can call it. That standardization is why installs crossed 97 million by mid-2026.
What are MCP tools, resources, and prompts?
Tools are functions the agent can call (with JSON-schema input). Resources are data the agent can read by URI (files, DB rows, API responses). Prompts are parametrized prompt templates the server exposes. A production MCP server typically ships 5–30 tools, 2–10 resource handlers, and 1–3 prompts.
How do I authenticate an MCP server?
Stdio MCP servers inherit the parent process's credentials. For SSE/HTTP servers, read a per-tenant API key on the handshake (e.g. X-Tenant-Token header) and issue short-lived JWTs scoped to that tenant for downstream calls.
How do I stream progress from a long-running MCP tool?
Use the MCP SDK's notifications/progress channel with the progressToken from req.params._meta. Emit progress events (e.g. 0.42 of 1.0) and the agent UI will surface a live progress indicator. Required for anything over ~5 seconds.
How do I test an MCP server locally?
Start with the MCP Inspector: 'npx @modelcontextprotocol/inspector node dist/server.js'. It opens a web UI for listing and calling tools with raw protocol traffic visible. For CI, use the MCP SDK Client to spawn your server and smoke-test tool invocations on every PR.
Should tool errors throw or return structured errors?
Return structured errors with isError: true. Exceptions look like server crashes to the agent; structured errors let the agent reason about the failure and retry. Always wrap handlers in try/catch and convert exceptions to content with isError: true.
Written by Harsh Rastogi — Full Stack Engineer building production Generative AI systems at Modelia. Connect with me on LinkedIn for more on Shopify, Generative AI, agentic systems, and production engineering.

