A practical, evidence-first playbook to secure models, RAG, and agentic AI across your stack.
TL;DR
- Treat Security for AI (hardening your apps, models, agents) and AI for Security (copilots for SecOps) as two distinct programs with separate backlogs, budgets, and KPIs, connect them through a shared evidence bus and evaluation packs.
- Put policy at the boundary, every AI call goes through a gateway for input and output policy, schemas, budgets, approvals, and trace export.
- Make schema-first outputs and deny-by-default your default path, malformed outputs never reach effectful systems.
- Prove provenance, sign data and indexes, attach source IDs, and keep takedown workflows.
- Run continuous assurance, evals in CI, weekly adversarial packs, and a Safety Scorecard that drives release decisions.
Why AI Security Now
AI is already in your workflows, chatbots, knowledge search, code assistants, agentic automations. That turns untrusted inputs into potential instructions and model outputs into actions. The job is to reduce surprise, control cost, and keep an audit trail that you can replay in minutes.
What Needs Protecting
- Data lifecycle, sources, licenses, consent, manifests, takedown.
- Models and pipelines, versions, routers, and evals.
- Apps, connectors, plugins, input policy and output validation.
- Agents and tools, typed adapters, sandboxing, approvals.
- Infrastructure and secrets, segmentation, just in time credentials, and observability.
Definition of Done For Any AI Workload
Documented assets and risks, evaluated with safety and adversarial tests, governed with approvals, isolated with allow-lists and sandboxes, observable with a complete evidence trail, reversible with rollback, compliant with dated obligations.
The Two Lens Model You Should Run In Parallel
- Security for AI, gateways, schemas, provenance, isolation, evidence, and gates across the SDLC.
- AI for Security, copilots that summarize alerts, draft playbooks, and triage tickets on the same evidence bus.
Keep them distinct in ownership and scorecards, but link them tightly. Reuse evaluations, export traces in open formats, and feed incidents back into policies and tests.
Day One Guardrails To Apply Everywhere
- Policy at boundary, all model calls pass the gateway for I and O policy, budgets, and evidence.
- Schema-first outputs, tools require typed outputs, deny on mismatch.
- Least privilege and time bound, short lived credentials, per tool scopes, egress allow-lists.
- Signed data and provenance, manifests for corpora and indexes, retrieval includes source IDs and signatures.
- Continuous assurance, evals in CI and on shadow traffic, block on critical regressions.
- Replayability, you can reconstruct any session from logs within minutes.
Quick Checklist
- Gateway on every route
- Prompt templates in version control
- Output schemas for every tool and text
- Budgets and loop caps per tenant and per session
- Log redaction for PII and secrets
- OpenTelemetry or JSON traces to SIEM and GRC
- Weekly Safety Scorecard in review
Threats And Controls That Matter
Application Level Risks, Shortlist
- Prompt injection, hide active instructions in inputs or retrieved pages. Mitigate with input policy at the gateway, context segregation, allow-listed sources, and schema-only tool calls.
- Insecure output handling, treating model text as commands. Mitigate with strict schemas, typed adapters, and human in the loop for effectful actions.
- Training or retrieval poisoning, corrupted examples or poisoned pages. Mitigate with signed data manifests, drift monitors, and quarantine playbooks.
Where Attacks Show Up In Practice
Plugins and tools, SSRF and SQL injection via generated strings, routers and gateways, budget and loop cap evasion and cache poisoning, retrieval and indexes, license gaps and prompt stuffing, supply chain, unsigned models or plugins, secrets and identity, long lived tokens and keys in prompts.
Anti Patterns
Logging raw prompts and outputs with PII, free form text parsed downstream, and one time red team exercises.
Reference Architectures You Can Adopt Now
Each blueprint includes purpose, when to use, trust boundaries, control points, required evidence, acceptance tests, KPIs, and a one week rollout.
Thin Wrapper LLM App
Safest pattern for copilots and Q and A when there are no tool calls and minimal retrieval.
Control points, pre filters for injection and sensitive info disclosure, structured prompts in git, output validation with a schema even for text, refusal and escalation on low confidence, rate and budget caps, and trace export with model and policy decisions.
Acceptance tests: injection block rate at least 95 percent, schema pass at least 99 percent on first try, zero confirmed leaks, replay any answer within 2 minutes.
Enterprise Agent Gateway
For assistants that act. Mediate every tool call through policy and schemas, run tools in sandboxes with controlled egress, require approvals for effectful actions, apply budgets and loop caps, and keep replayable traces that link prompts, plans, schemas, approvals, and results.
Private RAG
Sign corpora and index manifests, attach source IDs to context, isolate retrieval per index, and evaluate grounding and faithfulness as part of CI.
High Risk Isolation
Route sensitive or high impact tasks to a hardened tier with tighter budgets, stricter schemas, stronger sandboxing, and default refusal when uncertainty is high.
Testing And Assurance That Actually Blocks Regressions
- AI red teaming across plugins, gateways, retrieval, supply chain, secrets.
- Continuous evaluation in PR and nightly jobs, with adversarial packs to stress safety, leakage, grounding, and structure.
- AI pen testing with black box, gray box, and white box rounds.
Definition of Done For Assurance
PR and nightly evals are green, scenario packs run with fixes landed, pen test on new adapters has no open criticals, evidence bundle is complete and exportable, on call can disable routes and roll back within SLO.
Tooling Fit Checks
Reproducibility and open formats, provenance tagging and license tracking, coverage of safety and structure tests, CI performance, alerting on thresholds, and an API for CI with dashboards for engineering and risk.
Governance, Compliance, And The 2025 Lawscape
Pair frameworks to move fast and stay auditable, NIST AI RMF for risk backbone, ISO and IEC 42001 to run an auditable management system, and SAIF for practitioner controls. Keep a minimum evidence bundle per workload with SBOM M, signed data manifests, gateway policy snapshots, router rules, schemas, tool scopes, human in the loop criteria, evaluation results, and exports to SIEM and GRC.
Regional Playbooks, Starter Set
- EU GPAI providers, transparency docs, state of the art cybersecurity, incident logging, and post market monitoring.
- US Federal deployers, inventories, risk assessments, impact statements, incident handling, oversight.
- Singapore private sector, purpose limitation, data minimization, consent and explainability.
- South Korea AI Basic Act, labeling, safety, and transparency duties.
- Australia, data supply chain security and drift control across the lifecycle.
Lifecycle Gates
Gate 2, evals stable and red team plan ready.
Gate 3, evidence bundle v1, policy snapshots, signed corpora, human in the loop configs, rollback plan, no critical regressions.
Gate 4, weekly safety scorecard in production, drift and cost dashboards, quarterly evidence refresh for board and auditors.
Operating Model, Cadences, And KPIs
Tri Owner Model
CISO accountable, Product Security owns Security for AI, SOC runs AI for Security, Data and Privacy own sources and minimization, Engineering owns gateways, schemas, and telemetry.
Rituals That Keep You Honest
Weekly AI risk standup and Safety Scorecard review, monthly program review, quarterly board brief and tabletop exercise.
KPIs To Watch
- Coverage and posture, percent traffic behind gateway, SBOM M coverage, signed corpora, schema coverage.
- Quality and safety, eval pass rate, injection block rate, schema pass rate, grounding score, never event count.
- Cost and performance, cost per task, tokens per request, cache hit rate, loop abort rate, latency SLO.
- Operations and resilience, MTTR, rollback time, drift to quarantine, incident drill pass rate.
- Compliance and audit, evidence completeness, obligations on track, audit findings closed on time.
A Zero To 365 Day Plan You Can Run
0-30 Days
Turn on approval capture for effectful actions, put prompts, policies, and retrieval configs under PR based change control, and schedule the first AI specific tabletop. Land a gateway, eval, and observability for one pilot route and produce your first evidence bundle.
31-90 Days
Close SOWs with evidence and exit clauses, wire OpenTelemetry to SIEM and GRC, roll agent tiers and human in the loop for one effectful use case, publish the weekly Safety Scorecard, and train teams.
90-180 Days
Institutionalize. Expand signed corpora and takedown workflows, broaden eval coverage and red team packs, stand up program reviews, and wire obligations to gates.
180-365 Days
Scale safely. Increase coverage to near 100 percent behind the gateway, make evidence bundles routine, raise grounding targets, and tighten cost SLOs by routing and caching.
Smell Test For Releases
- Is every model or tool behind a gateway with policies
- Will malformed output be rejected before any side effect
- Can you list the signed sources for what the model saw
- Do tools run in a sandbox with allow listed egress
- Do all effect actions require approvals and leave artifacts
- Can you reconstruct any session from logs in three minutes
- Can you disable a route, tool, or model in under a minute and roll back in fifteen
Cost And ROI Framing
Look beyond the license price. Include build, run, switching, and risk costs. Use caching and context budgets, safer fallback routing, per tenant budgets, log tiering, and dataset deduplication. Prove ROI with a two week test, flip schemas, routing, caching, and budgets on one route and measure block rates, pass rates, grounding, replay time, and cost per task.
Conclusion
Security for AI should make teams faster and safer at the same time. Land the gateway and evidence bus first, turn schemas and provenance into paved roads, and make safety visible in scorecards that drive decisions. When you can replay any action, prove sources, and roll back quickly, AI becomes an asset you can scale with confidence.