AI Security

September 5, 2025

MCP Server : Engineering Team’s Playbook

Joyjeet Dan

Head of Demand Generation

ON THIS PAGE

10238 views

TL;DR

You’ll stand up a minimal MCP server, wire two real tools (one read, one write), add inline guardrails, light tracing, and CI tests. By the end, your full-stack team can turn a chat prompt into governed actions across your APIs with predictable cost and clean debuggability.

Who this is for

Full-stack teams that own features end-to-end and want a single, safe way to let agents (and IDE copilots) call your internal tools and services.

What you’ll ship (today)

A minimal MCP server (Node or Python) running locally over stdio
Two tools: orders.export (read with masking) and flag.set (write with limits)
Inline policy: allow, deny, redact + region routing
Traces: spans stitched for agent → MCP → API with useful attributes
CI checks: golden flow + one adversarial “don’t allow bulk write without ticket”
Host config (Claude Desktop or VS Code + Continue) to use your server

Prerequisites

Node 18+ or Python 3.10+
A test API to hit (even a mock)
VS Code (optional), Git, a place to run CI

Architecture at a glance

Play 0 - Scaffold the server (10–15 min)

Option A: TypeScript (Node)

BASH

npm init -y
npm i @modelcontextprotocol/sdk

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const mcp = new McpServer({ name: "team-tools", version: "1.0.0" });

// orders.export (read)
mcp.registerTool("orders.export", {
  title: "Export orders",
  description: "Export orders for a date range with optional masking",
  inputSchema: {
    type: "object",
    properties: {
      from: { type: "string", format: "date" },
      to: { type: "string", format: "date" },
      mask: { type: "array", items: { type: "string" } }
    },
    required: ["from","to"]
  }
}, async ({ from, to, mask = [] }) => {
  const rows = [
    { orderId: "1001", email: "a@ex.com", total: 30.5 },
    { orderId: "1002", email: "b@ex.com", total: 18.0 }
  ];
  const masked = rows.map(r => ({
    ...r,
    email: mask.includes("email") ? "***@***" : r.email
  }));
  return { content: [{ type: "json", json: { count: masked.length, rows: masked } }] };
});

// flag.set (write)
mcp.registerTool("flag.set", {
  title: "Set feature flag",
  description: "Set rollout percentage for a feature flag in an environment",
  inputSchema: {
    type: "object",
    properties: {
      name: { type: "string" },
      env: { type: "string", enum: ["dev","staging","prod"] },
      rolloutPct: { type: "number", minimum: 0, maximum: 100 },
      ticket: { type: "string" }
    },
    required: ["name","env","rolloutPct"]
  }
}, async ({ name, env, rolloutPct, ticket }) => {
  if (env === "prod" && rolloutPct > 10 && !ticket) {
    return { content: [{ type: "text", text: "Denied: prod change >10% requires change ticket" }] };
  }
  return { content: [{ type: "text", text: `Flag ${name} set to ${rolloutPct}% in ${env}` }] };
});

await mcp.connect(new StdioServerTransport());
console.log("MCP server ready");

Option B: Python

BASH

pip install mcp

PYTHON

# server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server

srv = Server("team-tools", "1.0.0")

@srv.tool()
def orders_export(from_: str, to: str, mask: list[str]|None=None):
    rows=[{"orderId":"1001","email":"a@ex.com","total":30.5},
          {"orderId":"1002","email":"b@ex.com","total":18.0}]
    mask=mask or []
    for r in rows:
        if "email" in mask: r["email"]="***@***"
    return {"count": len(rows), "rows": rows}

@srv.tool()
def flag_set(name: str, env: str, rolloutPct: float, ticket: str|None=None):
    if env=="prod" and rolloutPct>10 and not ticket:
        return "Denied: prod change >10% requires change ticket"
    return f"Flag {name} set to {rolloutPct}% in {env}"

if __name__ == "__main__":
    stdio_server(srv)

Play 1 - Wire a host (Claude Desktop or VS Code)

Claude Desktop (example)

JSON

{
  "mcpServers": {
    "team-tools-node": {
      "command": "node",
      "args": ["--loader","ts-node/esm","/ABS/PATH/src/server.ts"]
    },
    "team-tools-py": {
      "command": "python",
      "args": ["/ABS/PATH/server.py"]
    }
  }
}

Restart Claude Desktop, approve tools, try:

“Export orders from 2025-08-25 to 2025-08-31 with emails masked”
“Set flag checkout_v2 to 10 percent in staging”
“Set flag checkout_v2 to 50 percent in prod” → expect deny without ticket

VS Code + Continue (conceptual)

Open Continue’s settings and register a “custom MCP server” with command and args pointing to your script. Test the same prompts inside the IDE.

Play 2 - Put policy in the path

Start with three rules: deny risky bulk, redact PII, route by region. Keep rules fast.

YAML

# policy.yaml
- rule: deny-prod-flag-over-10-without-ticket
  when: { tool: "flag.set", cond: "env == 'prod' && rolloutPct > 10 && !ticket" }
  decision: deny
  reason: "Prod change >10% requires ticket"

- rule: redact-email-on-orders-export
  when: { tool: "orders.export" }
  decision: allow
  redact:
    - field: "email"
      mode: "mask"

- rule: route-eu-orders
  when: { tool: "orders.export", cond: "subject.region == 'EU'" }
  route: "eu-west"

Tip: Start in “shadow mode” for new rules to measure impact before enforcing.

Play 3 - Identity and scopes for non-humans

Issue short-lived tokens, scope by tool and purpose, require JIT elevation for prod writes.

JSON

{
  "principal": "agent:copilot",
  "allow": ["orders.read", "flags.write:staging"],
  "deny": ["flags.write:prod"],
  "ttlSeconds": 600
}

Elevation example:

JSON

{
  "principal": "agent:release",
  "request": "flags.write:prod",
  "reason": "hotfix-1234",
  "approver": "oncall-sre",
  "ttlSeconds": 900
}

Play 4 - Add traces you can actually use

Emit spans and attributes that make debugging obvious.

Useful span names

PGSQL

agent.plan
mcp.tool.invoke
service.api.call
policy.evaluate
dlp.redact

Helpful attributes

INI

actor.id=agent:copilot
actor.type=agent|mcp|service
tool.name=orders.export
scope.grants=flags.write:staging
policy.decision=allow|deny|redact
pii.count=2

Even if you do not run a full OTEL stack yet, log these attributes next to each tool call.

Play 5 - Tests that prevent bad days

Add two test classes to CI:

Golden flow: “Export orders with email masked” → expect count and masked emails.
Adversarial: “Set checkout_v2 to 50 percent in prod” without ticket → expect deny and reason.

Example (pseudo-JS):

it("denies risky prod flag change without ticket", async () => {  const res = await callTool("flag.set", { name:"checkout_v2", env:"prod", rolloutPct:50 });  expect(res.text).toMatch(/Denied/);});

Play 6 - Rollout patterns that won’t wake you up at 2am

Roll flags to 10 percent first, auto-rollback on error spike
Require ticket for prod changes >10 percent
Use idempotency keys for any write tool to avoid double writes
Put rate limits and concurrency caps on hot tools

Play 7 - Data privacy in motion

Redact sensitive fields in prompts and outputs (email, PAN, tokens)
Route EU data to EU storage; block unknown vendors
Keep a “deny by default” for new external egress until reviewed

Play 8 - Cost and loop control

Budgets per agent and per tool (e.g., daily LLM $50 for agent:copilot)
Stop rules for long plans; alert on bursty retries
Per-tool rate limits and P95 latency SLOs

Budget guard example:

JSON

{
  "principal": "agent:copilot",
  "limits": { "rpm": 60, "concurrency": 5, "daily_cost_usd": 50 }
}

Ops checklist (pin this to your repo)

Stdio server runs locally; host config checked in (paths templated)
Tools have examples and schemas; inputs validated
Policy rules in repo; shadow mode for new ones; tests pass
Traces or structured logs with actor, tool, decision, pii.count
Idempotency keys on writes; rate limits and caps applied
Golden + adversarial tests in CI; PR template asks for both
README: how to run server, host config, common prompts

Troubleshooting quickies

Host does not see tools: wrong path or missing stdin/stdout transport, restart host.
Tool runs but returns nothing: return envelope must include content.
Policy blocks everything: switch new rules to shadow mode, print debug fields.
Double writes: add idempotency key, check retries and timeouts.
Telemetry spam: sample spans or log only on errors at first.

What good looks like after 2–3 sprints

New internal tool wrapped in 1–2 days, with examples and tests
95 percent of actions have signed traces or structured logs
Inline policy denies at least one risky action per month with low false positives
No surprise LLM or API bills; budgets and caps visible in dashboards
Other teams reuse your tools instead of building one-offs

How Levo can help

Levo gives your team production-grade rails without the yak-shave: mesh visibility stitched by identity, inline policy and redaction in the path, signed evidence you can export, exploit-aware tests for CI, and budget guards. You keep building features; Levo keeps actions safe, observable, and within limits.

Interested to see how this looks in practice: Book a demo.

Learn more: Levo MCP server use case → https://www.levo.ai/use-case/mcp-server

Conclusion

MCP gives full-stack teams a practical way to turn prompts into real, governed actions across internal APIs. By starting with a minimal server and adding inline guardrails, scoped identity, tracing, and CI tests, teams can safely enable agent-driven workflows without losing control or visibility.

The focus should be on implementing core foundations early such as schema validation, policy enforcement, and basic observability. Once these are in place, teams can scale with confidence by adding cost controls, adversarial testing, and rollout safeguards. This approach ensures faster delivery, predictable costs, and systems that are easy to debug and trust in production.

FAQs

What is an MCP server and why is it useful?
An MCP server exposes internal tools that AI agents or copilots can call to perform actions. It creates a controlled layer that connects prompts to real systems safely.

How do MCP tools like orders.export and flag.set help teams?
These tools represent real actions such as reading data or updating configurations. They allow teams to standardize how agents interact with APIs using structured inputs and outputs.

Why is inline policy important in MCP systems?
Inline policy enforces rules like allow, deny, and redact during execution. This prevents unsafe actions before they reach backend systems.

What role does tracing play in MCP workflows?
Tracing captures the full flow from agent to tool to API with useful metadata. It helps teams debug issues quickly and understand system behavior clearly.

How do CI tests improve MCP reliability?
CI tests validate both normal and adversarial scenarios. They ensure that expected actions work correctly and unsafe actions are blocked before deployment.

How can teams control cost and performance in MCP setups?
Teams can use budgets, rate limits, and concurrency caps to manage usage. Stop rules and monitoring help prevent excessive retries and unexpected costs.

Summarize with AI

📖 People also read

What is an API Gateway?

Explore the Top 10 API Inventory Tools of 2025 offering full visibility, automated discovery, and shadow API security with feature and pricing insights.

What Is API Management? A Practical Guide for Modern Systems

API management enables organizations to design, secure, publish, and monitor APIs across their lifecycle. Learn how it works, its components, challenges, and why modern systems require more than traditional API management.

We didn’t join the API Security Bandwagon. We pioneered it!

Book a Demo

View Pricing

MCP Server : Engineering Team’s Playbook

TL;DR

Who this is for

What you’ll ship (today)

Prerequisites

Architecture at a glance

Play 0 - Scaffold the server (10–15 min)

Option A: TypeScript (Node)

Option B: Python

Play 1 - Wire a host (Claude Desktop or VS Code)

Claude Desktop (example)

VS Code + Continue (conceptual)

Play 2 - Put policy in the path

Play 3 - Identity and scopes for non-humans

Play 4 - Add traces you can actually use

Play 5 - Tests that prevent bad days

Play 6 - Rollout patterns that won’t wake you up at 2am

Play 7 - Data privacy in motion

Play 8 - Cost and loop control

Ops checklist (pin this to your repo)

Troubleshooting quickies

What good looks like after 2–3 sprints

How Levo can help

Conclusion

FAQs

Summarize with AI

📖 People also read

More from our blogs you shouldn’t miss

What Is an AI Agent? How to Secure AI Agents

What Is MCP Server in API and AI Security

What Is Direct Prompt Injection?

What Is Indirect Prompt Injection?

We didn’t join the API Security Bandwagon. We pioneered it!