September 4, 2025

API Security: Engineering Team’s Playbook

Buchi Reddy B

CEO & Founder at LEVO

Levo API Security Research Panel

Research Team

API Security: Engineering Team’s Playbook

TL;DR

Treat security as product requirements: contract-first, short-lived tokens, object-level auth, schema validation, rate limits, replay guards, evidence by default.
Wire small, reliable gates into CI and portable policies at edge + service.
Start with 5 changes in 30 days: API-BOM, JWT aud/iss checks, object-ownership checks on money + identity flows, write-route rate limits + normalization, webhook signatures with 5-minute replay window.

Who this is for, and how to use it

Engineers, staff ICs, tech leads, and platform folks who build and run APIs. Use this playbook to scaffold new services, retrofit legacy ones, and standardize controls across languages. Skim Quickstart to land the basics, then wire the CI jobs and runtime policies. Reference the snippets whenever you open a PR.

Day-0 Quickstart (copy these into your next PR)

Add a contract

For REST: OpenAPI 3.1 with securitySchemes, strict request/response schemas, and explicit error shapes.
For GraphQL: schema SDL checked in, persisted queries only.
For gRPC: .proto files with method allowlists in the proxy.

Add token checks

Verify iss, aud, exp, and signature; rotate keys; prefer short TTLs.
Block tokens with wrong aud even if valid.

Enforce object-level authorization (BOLA stop)

Every read/write checks tenant + subject ownership, near the data.

Rate-limit and normalize

Soft limits on reads, stricter on writes; collapse near-dupe payloads.

Replay + webhook hygiene

HMAC signature with timestamp; 5-minute window; idempotency keys.

Log decisions, not secrets

Structured logs with correlation IDs; mask PII; short retention.

Threat model (engineer edition)

IDOR/BOLA: object ownership missing or inconsistent.
Mass assignment: extra fields accepted and honored.
Token misuse: missing aud/iss checks; long TTLs; reuse across services.
Parser & resource exhaustion: large, nested, or weird payloads.
Webhook replay/spoof: no signature check or long replay window.
Version drift: deprecated routes still live; unowned endpoints (“shadow/zombie”).

Fix with contract-first design, strict types, portable policies, and negative tests on every change.

Design: contract-first (REST)

YAML

openapi: 3.1.0
info: { title: Payments API, version: "1.2.0" }
components:
  securitySchemes:
    oauth2:
      type: oauth2
      flows:
        clientCredentials:
          tokenUrl: https://idp.example.com/oauth2/token
          scopes:
            payments.read: "Read payments"
            payments.write: "Create payments"
  schemas:
    PaymentCreate:
      type: object
      additionalProperties: false
      required: [amount, currency]
      properties:
        amount: { type: integer, minimum: 1 }
        currency: { type: string, enum: [USD, EUR, GBP] }
security: [{ oauth2: [payments.read] }]
paths:
  /v1/payments:
    post:
      security: [{ oauth2: [payments.write] }]
      requestBody:
        required: true
        content:
          application/json: { schema: { $ref: "#/components/schemas/PaymentCreate" } }
      responses:
        "201": { description: Created }
        "400": { description: Schema violation }
        "401": { description: Bad token }
        "403": { description: Forbidden }

AuthN & AuthZ: practical patterns

Node (Express) - JWT checks + object authorization

import express from "express";
import jwt from "jsonwebtoken";
import jwksClient from "jwks-rsa";
const app = express();

const client = jwksClient({ jwksUri: "https://idp.example.com/.well-known/jwks.json" });
const ISSUER = "https://idp.example.com/";
const AUD = "api://payments";

function getKey(header, cb){ client.getSigningKey(header.kid, (e, key)=>cb(e, key.getPublicKey())); }
function requireJwt(req, res, next){
  const token = (req.headers.authorization || "").replace("Bearer ","");
  if(!token) return res.status(401).json({error:"missing_token"});
  jwt.verify(token, getKey, { algorithms:["RS256"], issuer: ISSUER }, (err, decoded)=>{
    if(err || decoded.aud !== AUD) return res.status(401).json({error:"invalid_token"});
    req.user = decoded; next();
  });
}

app.get("/v1/accounts/:id", requireJwt, async (req, res)=>{
  const acct = await db.accounts.findById(req.params.id);
  if(!acct || acct.tenant_id !== req.user.tid || acct.owner !== req.user.sub)
    return res.status(403).json({error:"forbidden"});
  res.json(acct);
});

‍Python (FastAPI) - schema + ownership

PYTHON

from fastapi import FastAPI, Header, HTTPException
from pydantic import BaseModel, Field

app = FastAPI()

class PaymentCreate(BaseModel):
    amount: int = Field(gt=0)
    currency: str

def verify_token(auth: str):
    # Parse and verify JWT signature, iss, aud, exp using your library of choice
    # Return dict with 'sub' and 'tid' on success
    ...

@app.post("/v1/payments")
def create_payment(body: PaymentCreate, authorization: str = Header(None)):
    if not authorization: raise HTTPException(401, "missing_token")
    user = verify_token(authorization.replace("Bearer ",""))
    # Ownership example for write: ensure user can create for this tenant
    if user["tid"] != "tenant-123": raise HTTPException(403, "forbidden")
    return {"status": "created"}

Request normalization, rate limits, and replay protection

NGINX (snippet) - JWT file + rate limit + normalization

NGINX

map $http_authorization $jwt { "~^Bearer (.+)$" $1; default ""; }

limit_req_zone $binary_remote_addr zone=write_zone:10m rate=10r/s;

server {
  listen 443 ssl;
  location /v1/ {
    # Token validation via JWT key file (or use an auth subrequest)
    auth_jwt "secured";               # requires nginx-plus or module; otherwise external auth
    auth_jwt_key_file /etc/nginx/jwk.json;

    # Normalize
    proxy_set_header X-Normalized "1";   # pair with app-level canonicalization
    limit_req zone=write_zone burst=20 nodelay;

    proxy_pass http://payments_upstream;
  }
}

Idempotency + replay window (Node)

Webhook signature verify (Python)

PYTHON

import hmac, hashlib, time, base64
def verify(sig_header: str, payload: str, secret: str) -> bool:
    # header format: t=unix,s=hex(hmac_sha256(f"{t}.{payload}", secret))
    try:
        t, s = [p.split("=")[1] for p in sig_header.split(",")]
        if abs(time.time() - int(t)) > 300: return False
        mac = hmac.new(secret.encode(), f"{t}.{payload}".encode(), hashlib.sha256).hexdigest()
        return hmac.compare_digest(mac.lower(), s.lower())
    except Exception:
        return False

GraphQL hardening

Disable raw POSTs; persisted queries only.
Enforce depth and cost limits; disallow introspection in prod.
Put authorization at resolver level for sensitive fields.

Apollo Server (persisted queries)

import { ApolloServer } from "@apollo/server";
import { createPersistedQueryPlugin } from "@apollo/server-plugin-persisted-queries";
const server = new ApolloServer({ typeDefs, resolvers, plugins: [createPersistedQueryPlugin()] });

gRPC controls

mTLS between services; method allowlists at proxy.
Deadlines on every call; max message size; RBAC per method.

Go - unary interceptor sketch

func authInterceptor(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
  md, _ := metadata.FromIncomingContext(ctx)
  token := strings.TrimPrefix(md["authorization"][0], "Bearer ")
  sub, tid, err := verifyJWT(token) // verify iss, aud, exp
  if err != nil { return nil, status.Error(codes.Unauthenticated, "bad token") }

  if !allowed(info.FullMethod, sub, tid) { // method-level RBAC/ABAC
    return nil, status.Error(codes.PermissionDenied, "forbidden")
  }
  ctx = context.WithValue(ctx, "principal", sub)
  return handler(ctx, req)
}

CI: gates that don’t flake (GitHub Actions example)

YAML

name: api-ci
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install deps
        run: npm ci
      - name: Contract checks
        run: npm run openapi:lint && npm run openapi:bundle
      - name: Negative tests (curl)
        run: |
          ./scripts/neg-tests.sh   # includes IDOR, overposting, expired/wrong aud token cases
      - name: GraphQL safety
        run: npm run gql:persist && npm run gql:check-depth-cost
      - name: Fuzz light
        run: npm run fuzz:payloads

Sample negative tests (bash)

BASH

# 1) Wrong audience
curl -s -o /dev/null -w "%{http_code}\n" \
 -H "Authorization: Bearer $WRONG_AUD" https://api.example.com/v1/payments | grep -q "^401$"

# 2) IDOR attempt
curl -s -o /dev/null -w "%{http_code}\n" \
 -H "Authorization: Bearer $TOKEN_A" https://api.example.com/v1/accounts/${OTHER_USER} | grep -q "^403$"

# 3) Overposting
curl -s -o /dev/null -w "%{http_code}\n" \
 -H "Authorization: Bearer $TOKEN_A" -H "Content-Type: application/json" \
 -d '{"amount":100,"currency":"USD","role":"admin"}' \
 https://api.example.com/v1/payments | grep -E "^(400|422)$"

Observability for security

Log shape (JSON)

JSON

{
  "ts":"2025-09-04T12:00:00Z",
  "correlation_id":"c-49f8",
  "route":"POST /v1/payments",
  "principal":"sub:a1f3...",
  "tenant":"tid:t-77",
  "decision":"allow",
  "reason":"scopes:payments.write",
  "latency_ms":42,
  "pii_masked":true
}

Detections to wire

403 spikes by route and principal
Schema violation bursts on a route or version
Tokens used across multiple services in short time window
Repeated webhook IDs or timestamp skew

API discovery and API-BOM

Track this in code (CSV/JSON in repo) and surface via dashboard.

PGSQL

service, path, method, version, owner, data_class, auth, pii_fields, last_seen, risk payments, /v1/payments, POST, 1, @team-pay, sensitive, oauth2, amount|currency, 2025-09-01, high

First 90 days engineering plan

30 days - baseline

Schemas published for top services; JWT aud/iss checks everywhere; write-route limits + normalization; KPI baseline.

60 days - harden critical paths

Ownership checks on account, payment, password flows; persisted queries + depth/cost caps; CI jobs for IDOR + overposting.

90 days - clean up and codify

Remove dead routes; enforce deprecation calendar; auth/schema error budgets with rollback; updated service template.

RACI that keeps velocity

Platform: shared auth libs, policy bundles, CI jobs, templates.
Service teams: adopt bundles, implement ownership checks, add tests.
Security: define rules, budgets, evidence; review violations.
SRE: run gateways/mesh, alerts, rollbacks; publish reliability SLOs.
Product: deprecation windows; partner comms on version changes.

Anti-patterns to retire

Custom auth sprinkled in services; only-at-edge schema checks; lenient staging vs strict prod; long-lived tokens; PII in logs; alerts with no owners; launching breaking changes without shadow validation.

Introduction to Levo, how we help

Levo gives privacy-preserving runtime visibility and contract validation without moving payloads out of your boundary. Findings become policies and CI tests you adopt service by service. Pricing stays predictable as services and environments grow, so platform teams can raise reliability and speed at the same time.

See how this looks in practice, book a short working session on your two highest risk flows book a demo.

Conclusion

Contracts, portable policies, and small reliable tests make reliability a pipeline property, not a late-night firefight. Ship these guardrails with every service and you’ll go faster with fewer incidents.

Related: Learn how Levo is solving the API security issue with it's fix first approach and a product which is scale agnostic, data privacy first and growth immune pricing Levo's API Solution.

FAQs

What’s the fastest way to stop BOLA today?
Add a shared helper for tenant + subject checks. Call it on every read/write of sensitive resources. Add two negative tests per route.

Do I put auth at edge or in code?
Both. Edge for coarse checks and token validity. In code for object ownership and business rules. Keep both as code.

Will rate limits break customers?
Use soft limits on reads, stricter on writes. Scope by principal and route. Monitor 429s and tune weekly.

How do I make GraphQL safe without killing flexibility?
Persisted queries only, depth and cost caps, field-level auth for sensitive data. Version queries like you version REST.

gRPC best practices in one line?
mTLS + SPIFFE IDs, method allowlists, deadlines, max message size, and proxy-level RBAC.

How do I avoid flaky CI gates?
Small tests with realistic fixtures; assert on exact contract violations; run the noisy gates in monitor for a sprint before blocking.

What should I log?
Correlation ID, route, principal, decision, reason, latency. Mask PII. Keep debug logs short-lived.

How do I detect IDOR early?
Alert on 403 spikes and sequential ID access. Pair with schema-violation alerts for the same route.

How do I secure webhooks?
HMAC signatures with timestamps; 5-minute replay window; store nonce per event; idempotent handlers.

How do I retrofit a legacy service?
Wrap at gateway with token validation + schema checks + limits. Add ownership in code next sprint. Plan version retirement.

What is “request normalization”?
Canonicalize field order, trim whitespace, lower-case headers where safe, de-dup params, bound array sizes. Prevent near-duplicate floods.

Where do I store evidence for audits?
In the repo with policy code, test results, and dashboards as artifacts. Automate a weekly export.

How do I handle agent/LLM traffic?
Whitelist tools/routes, cap outputs, scrub prompts from logs, monitor vector-store access for sensitive terms.

When do I block vs monitor?
Monitor in lower envs and for new rules; block high-risk routes in prod. Tie to violation budgets with rollback rules.

How do I keep templates from drifting?
CI checks that enforce template files and versions; a quarterly “template sync” PR across services.

What’s the minimum “secure by default” scaffold?
OpenAPI/SDL, JWT helper, ownership helper, request validator, rate-limit headers, idempotency + replay guards, structured logging, basic neg tests.

How do I make error messages safe?
Return codes and generic reasons; put detail in logs with masking; never echo raw queries or secrets.

Any tips for partner sandboxes?
Same policies as prod, lower thresholds; seed test data; rotate sandbox credentials often; publish a replay policy.

How do I prove ROI to leadership?
Show fewer incidents, faster MTTR, stable launch metrics, shorter security questionnaires, and predictable cost while traffic grows.

How do I get teams to adopt this?
Make the secure path the easy path: templates, helpers, and passing CI by default. Recognition for teams that remove dead routes and hit budgets.

ON THIS PAGE