September 5, 2025

MCP Server : Engineering Team’s Playbook

Buchi Reddy B

CEO & Founder at LEVO

Levo AI Security Research Panel

Research Team

MCP Server : Engineering Team’s Playbook

TL;DR

You’ll stand up a minimal MCP server, wire two real tools (one read, one write), add inline guardrails, light tracing, and CI tests. By the end, your full-stack team can turn a chat prompt into governed actions across your APIs with predictable cost and clean debuggability.

Who this is for

Full-stack teams that own features end-to-end and want a single, safe way to let agents (and IDE copilots) call your internal tools and services.

What you’ll ship (today)

A minimal MCP server (Node or Python) running locally over stdio
Two tools: orders.export (read with masking) and flag.set (write with limits)
Inline policy: allow, deny, redact + region routing
Traces: spans stitched for agent → MCP → API with useful attributes
CI checks: golden flow + one adversarial “don’t allow bulk write without ticket”
Host config (Claude Desktop or VS Code + Continue) to use your server

Prerequisites

Node 18+ or Python 3.10+
A test API to hit (even a mock)
VS Code (optional), Git, a place to run CI

Architecture at a glance

Play 0 - Scaffold the server (10–15 min)

Option A: TypeScript (Node)

BASH

npm init -y
npm i @modelcontextprotocol/sdk

// src/server.ts
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const mcp = new McpServer({ name: "team-tools", version: "1.0.0" });

// orders.export (read)
mcp.registerTool("orders.export", {
  title: "Export orders",
  description: "Export orders for a date range with optional masking",
  inputSchema: {
    type: "object",
    properties: {
      from: { type: "string", format: "date" },
      to: { type: "string", format: "date" },
      mask: { type: "array", items: { type: "string" } }
    },
    required: ["from","to"]
  }
}, async ({ from, to, mask = [] }) => {
  const rows = [
    { orderId: "1001", email: "a@ex.com", total: 30.5 },
    { orderId: "1002", email: "b@ex.com", total: 18.0 }
  ];
  const masked = rows.map(r => ({
    ...r,
    email: mask.includes("email") ? "***@***" : r.email
  }));
  return { content: [{ type: "json", json: { count: masked.length, rows: masked } }] };
});

// flag.set (write)
mcp.registerTool("flag.set", {
  title: "Set feature flag",
  description: "Set rollout percentage for a feature flag in an environment",
  inputSchema: {
    type: "object",
    properties: {
      name: { type: "string" },
      env: { type: "string", enum: ["dev","staging","prod"] },
      rolloutPct: { type: "number", minimum: 0, maximum: 100 },
      ticket: { type: "string" }
    },
    required: ["name","env","rolloutPct"]
  }
}, async ({ name, env, rolloutPct, ticket }) => {
  if (env === "prod" && rolloutPct > 10 && !ticket) {
    return { content: [{ type: "text", text: "Denied: prod change >10% requires change ticket" }] };
  }
  return { content: [{ type: "text", text: `Flag ${name} set to ${rolloutPct}% in ${env}` }] };
});

await mcp.connect(new StdioServerTransport());
console.log("MCP server ready");

Option B: Python

BASH

pip install mcp

PYTHON

# server.py
from mcp.server import Server
from mcp.server.stdio import stdio_server

srv = Server("team-tools", "1.0.0")

@srv.tool()
def orders_export(from_: str, to: str, mask: list[str]|None=None):
    rows=[{"orderId":"1001","email":"a@ex.com","total":30.5},
          {"orderId":"1002","email":"b@ex.com","total":18.0}]
    mask=mask or []
    for r in rows:
        if "email" in mask: r["email"]="***@***"
    return {"count": len(rows), "rows": rows}

@srv.tool()
def flag_set(name: str, env: str, rolloutPct: float, ticket: str|None=None):
    if env=="prod" and rolloutPct>10 and not ticket:
        return "Denied: prod change >10% requires change ticket"
    return f"Flag {name} set to {rolloutPct}% in {env}"

if __name__ == "__main__":
    stdio_server(srv)

Play 1 - Wire a host (Claude Desktop or VS Code)

Claude Desktop (example)

JSON

{
  "mcpServers": {
    "team-tools-node": {
      "command": "node",
      "args": ["--loader","ts-node/esm","/ABS/PATH/src/server.ts"]
    },
    "team-tools-py": {
      "command": "python",
      "args": ["/ABS/PATH/server.py"]
    }
  }
}

Restart Claude Desktop, approve tools, try:

“Export orders from 2025-08-25 to 2025-08-31 with emails masked”
“Set flag checkout_v2 to 10 percent in staging”
“Set flag checkout_v2 to 50 percent in prod” → expect deny without ticket

VS Code + Continue (conceptual)

Open Continue’s settings and register a “custom MCP server” with command and args pointing to your script. Test the same prompts inside the IDE.

Play 2 - Put policy in the path

Start with three rules: deny risky bulk, redact PII, route by region. Keep rules fast.

YAML

# policy.yaml
- rule: deny-prod-flag-over-10-without-ticket
  when: { tool: "flag.set", cond: "env == 'prod' && rolloutPct > 10 && !ticket" }
  decision: deny
  reason: "Prod change >10% requires ticket"

- rule: redact-email-on-orders-export
  when: { tool: "orders.export" }
  decision: allow
  redact:
    - field: "email"
      mode: "mask"

- rule: route-eu-orders
  when: { tool: "orders.export", cond: "subject.region == 'EU'" }
  route: "eu-west"

Tip: Start in “shadow mode” for new rules to measure impact before enforcing.

Play 3 - Identity and scopes for non-humans

Issue short-lived tokens, scope by tool and purpose, require JIT elevation for prod writes.

JSON

{
  "principal": "agent:copilot",
  "allow": ["orders.read", "flags.write:staging"],
  "deny": ["flags.write:prod"],
  "ttlSeconds": 600
}

Elevation example:

JSON

{
  "principal": "agent:release",
  "request": "flags.write:prod",
  "reason": "hotfix-1234",
  "approver": "oncall-sre",
  "ttlSeconds": 900
}

Play 4 - Add traces you can actually use

Emit spans and attributes that make debugging obvious.

Useful span names

PGSQL

agent.plan
mcp.tool.invoke
service.api.call
policy.evaluate
dlp.redact

Helpful attributes

INI

actor.id=agent:copilot
actor.type=agent|mcp|service
tool.name=orders.export
scope.grants=flags.write:staging
policy.decision=allow|deny|redact
pii.count=2

Even if you do not run a full OTEL stack yet, log these attributes next to each tool call.

Play 5 - Tests that prevent bad days

Add two test classes to CI:

Golden flow: “Export orders with email masked” → expect count and masked emails.
Adversarial: “Set checkout_v2 to 50 percent in prod” without ticket → expect deny and reason.

Example (pseudo-JS):

it("denies risky prod flag change without ticket", async () => {  const res = await callTool("flag.set", { name:"checkout_v2", env:"prod", rolloutPct:50 });  expect(res.text).toMatch(/Denied/);});

Play 6 - Rollout patterns that won’t wake you up at 2am

Roll flags to 10 percent first, auto-rollback on error spike
Require ticket for prod changes >10 percent
Use idempotency keys for any write tool to avoid double writes
Put rate limits and concurrency caps on hot tools

Play 7 - Data privacy in motion

Redact sensitive fields in prompts and outputs (email, PAN, tokens)
Route EU data to EU storage; block unknown vendors
Keep a “deny by default” for new external egress until reviewed

Play 8 - Cost and loop control

Budgets per agent and per tool (e.g., daily LLM $50 for agent:copilot)
Stop rules for long plans; alert on bursty retries
Per-tool rate limits and P95 latency SLOs

Budget guard example:

JSON

{
  "principal": "agent:copilot",
  "limits": { "rpm": 60, "concurrency": 5, "daily_cost_usd": 50 }
}

Ops checklist (pin this to your repo)

Stdio server runs locally; host config checked in (paths templated)
Tools have examples and schemas; inputs validated
Policy rules in repo; shadow mode for new ones; tests pass
Traces or structured logs with actor, tool, decision, pii.count
Idempotency keys on writes; rate limits and caps applied
Golden + adversarial tests in CI; PR template asks for both
README: how to run server, host config, common prompts

Troubleshooting quickies

Host does not see tools: wrong path or missing stdin/stdout transport, restart host.
Tool runs but returns nothing: return envelope must include content.
Policy blocks everything: switch new rules to shadow mode, print debug fields.
Double writes: add idempotency key, check retries and timeouts.
Telemetry spam: sample spans or log only on errors at first.

What good looks like after 2–3 sprints

New internal tool wrapped in 1–2 days, with examples and tests
95 percent of actions have signed traces or structured logs
Inline policy denies at least one risky action per month with low false positives
No surprise LLM or API bills; budgets and caps visible in dashboards
Other teams reuse your tools instead of building one-offs

How Levo can help

Levo gives your team production-grade rails without the yak-shave: mesh visibility stitched by identity, inline policy and redaction in the path, signed evidence you can export, exploit-aware tests for CI, and budget guards. You keep building features; Levo keeps actions safe, observable, and within limits.

Interested to see how this looks in practice: Book a demo.

Learn more: Levo MCP server use case → https://www.levo.ai/use-case/mcp-server

ON THIS PAGE