September 8, 2025

AI Security: The Complete Guide for 2025

Buchi Reddy B

CEO & Founder at LEVO

Levo AI Security Research Panel

Research Team

A practical, evidence-first playbook to secure models, RAG, and agentic AI across your stack.

TL;DR

Treat Security for AI (hardening your apps, models, agents) and AI for Security (copilots for SecOps) as two distinct programs with separate backlogs, budgets, and KPIs, connect them through a shared evidence bus and evaluation packs.
Put policy at the boundary, every AI call goes through a gateway for input and output policy, schemas, budgets, approvals, and trace export.
Make schema-first outputs and deny-by-default your default path, malformed outputs never reach effectful systems.
Prove provenance, sign data and indexes, attach source IDs, and keep takedown workflows.
Run continuous assurance, evals in CI, weekly adversarial packs, and a Safety Scorecard that drives release decisions.

Why AI Security Now

AI is already in your workflows, chatbots, knowledge search, code assistants, agentic automations. That turns untrusted inputs into potential instructions and model outputs into actions. The job is to reduce surprise, control cost, and keep an audit trail that you can replay in minutes.

What Needs Protecting

Data lifecycle, sources, licenses, consent, manifests, takedown.
Models and pipelines, versions, routers, and evals.
Apps, connectors, plugins, input policy and output validation.
Agents and tools, typed adapters, sandboxing, approvals.
Infrastructure and secrets, segmentation, just in time credentials, and observability.

Definition of Done For Any AI Workload

Documented assets and risks, evaluated with safety and adversarial tests, governed with approvals, isolated with allow-lists and sandboxes, observable with a complete evidence trail, reversible with rollback, compliant with dated obligations.

The Two Lens Model You Should Run In Parallel

Security for AI, gateways, schemas, provenance, isolation, evidence, and gates across the SDLC.
AI for Security, copilots that summarize alerts, draft playbooks, and triage tickets on the same evidence bus.
Keep them distinct in ownership and scorecards, but link them tightly. Reuse evaluations, export traces in open formats, and feed incidents back into policies and tests.

Day One Guardrails To Apply Everywhere

Policy at boundary, all model calls pass the gateway for I and O policy, budgets, and evidence.
Schema-first outputs, tools require typed outputs, deny on mismatch.
Least privilege and time bound, short lived credentials, per tool scopes, egress allow-lists.
Signed data and provenance, manifests for corpora and indexes, retrieval includes source IDs and signatures.
Continuous assurance, evals in CI and on shadow traffic, block on critical regressions.
Replayability, you can reconstruct any session from logs within minutes.

Quick Checklist

Gateway on every route
Prompt templates in version control
Output schemas for every tool and text
Budgets and loop caps per tenant and per session
Log redaction for PII and secrets
OpenTelemetry or JSON traces to SIEM and GRC
Weekly Safety Scorecard in review

Threats And Controls That Matter

Application Level Risks, Shortlist

Prompt injection, hide active instructions in inputs or retrieved pages. Mitigate with input policy at the gateway, context segregation, allow-listed sources, and schema-only tool calls.
Insecure output handling, treating model text as commands. Mitigate with strict schemas, typed adapters, and human in the loop for effectful actions.
Training or retrieval poisoning, corrupted examples or poisoned pages. Mitigate with signed data manifests, drift monitors, and quarantine playbooks.

Where Attacks Show Up In Practice

Plugins and tools, SSRF and SQL injection via generated strings, routers and gateways, budget and loop cap evasion and cache poisoning, retrieval and indexes, license gaps and prompt stuffing, supply chain, unsigned models or plugins, secrets and identity, long lived tokens and keys in prompts.

Anti Patterns

Logging raw prompts and outputs with PII, free form text parsed downstream, and one time red team exercises.

Reference Architectures You Can Adopt Now

Each blueprint includes purpose, when to use, trust boundaries, control points, required evidence, acceptance tests, KPIs, and a one week rollout.

Thin Wrapper LLM App

Safest pattern for copilots and Q and A when there are no tool calls and minimal retrieval.
Control points, pre filters for injection and sensitive info disclosure, structured prompts in git, output validation with a schema even for text, refusal and escalation on low confidence, rate and budget caps, and trace export with model and policy decisions.
Acceptance tests: injection block rate at least 95 percent, schema pass at least 99 percent on first try, zero confirmed leaks, replay any answer within 2 minutes.

Enterprise Agent Gateway

For assistants that act. Mediate every tool call through policy and schemas, run tools in sandboxes with controlled egress, require approvals for effectful actions, apply budgets and loop caps, and keep replayable traces that link prompts, plans, schemas, approvals, and results.

Private RAG

Sign corpora and index manifests, attach source IDs to context, isolate retrieval per index, and evaluate grounding and faithfulness as part of CI.

High Risk Isolation

Route sensitive or high impact tasks to a hardened tier with tighter budgets, stricter schemas, stronger sandboxing, and default refusal when uncertainty is high.

Testing And Assurance That Actually Blocks Regressions

AI red teaming across plugins, gateways, retrieval, supply chain, secrets.
Continuous evaluation in PR and nightly jobs, with adversarial packs to stress safety, leakage, grounding, and structure.
AI pen testing with black box, gray box, and white box rounds.

Definition of Done For Assurance

PR and nightly evals are green, scenario packs run with fixes landed, pen test on new adapters has no open criticals, evidence bundle is complete and exportable, on call can disable routes and roll back within SLO.

Tooling Fit Checks

Reproducibility and open formats, provenance tagging and license tracking, coverage of safety and structure tests, CI performance, alerting on thresholds, and an API for CI with dashboards for engineering and risk.

Governance, Compliance, And The 2025 Lawscape

Pair frameworks to move fast and stay auditable, NIST AI RMF for risk backbone, ISO and IEC 42001 to run an auditable management system, and SAIF for practitioner controls. Keep a minimum evidence bundle per workload with SBOM M, signed data manifests, gateway policy snapshots, router rules, schemas, tool scopes, human in the loop criteria, evaluation results, and exports to SIEM and GRC.

Regional Playbooks, Starter Set

EU GPAI providers, transparency docs, state of the art cybersecurity, incident logging, and post market monitoring.
US Federal deployers, inventories, risk assessments, impact statements, incident handling, oversight.
Singapore private sector, purpose limitation, data minimization, consent and explainability.
South Korea AI Basic Act, labeling, safety, and transparency duties.
Australia, data supply chain security and drift control across the lifecycle.

Lifecycle Gates

Gate 2, evals stable and red team plan ready.
Gate 3, evidence bundle v1, policy snapshots, signed corpora, human in the loop configs, rollback plan, no critical regressions.
Gate 4, weekly safety scorecard in production, drift and cost dashboards, quarterly evidence refresh for board and auditors.

Operating Model, Cadences, And KPIs

Tri Owner Model

CISO accountable, Product Security owns Security for AI, SOC runs AI for Security, Data and Privacy own sources and minimization, Engineering owns gateways, schemas, and telemetry.

Rituals That Keep You Honest

Weekly AI risk standup and Safety Scorecard review, monthly program review, quarterly board brief and tabletop exercise.

KPIs To Watch

Coverage and posture, percent traffic behind gateway, SBOM M coverage, signed corpora, schema coverage.
Quality and safety, eval pass rate, injection block rate, schema pass rate, grounding score, never event count.
Cost and performance, cost per task, tokens per request, cache hit rate, loop abort rate, latency SLO.
Operations and resilience, MTTR, rollback time, drift to quarantine, incident drill pass rate.
Compliance and audit, evidence completeness, obligations on track, audit findings closed on time.

A Zero To 365 Day Plan You Can Run

0-30 Days

Turn on approval capture for effectful actions, put prompts, policies, and retrieval configs under PR based change control, and schedule the first AI specific tabletop. Land a gateway, eval, and observability for one pilot route and produce your first evidence bundle.

31-90 Days

Close SOWs with evidence and exit clauses, wire OpenTelemetry to SIEM and GRC, roll agent tiers and human in the loop for one effectful use case, publish the weekly Safety Scorecard, and train teams.

90-180 Days

Institutionalize. Expand signed corpora and takedown workflows, broaden eval coverage and red team packs, stand up program reviews, and wire obligations to gates.

180-365 Days

Scale safely. Increase coverage to near 100 percent behind the gateway, make evidence bundles routine, raise grounding targets, and tighten cost SLOs by routing and caching.

Smell Test For Releases

Is every model or tool behind a gateway with policies
Will malformed output be rejected before any side effect
Can you list the signed sources for what the model saw
Do tools run in a sandbox with allow listed egress
Do all effect actions require approvals and leave artifacts
Can you reconstruct any session from logs in three minutes
Can you disable a route, tool, or model in under a minute and roll back in fifteen

Cost And ROI Framing

Look beyond the license price. Include build, run, switching, and risk costs. Use caching and context budgets, safer fallback routing, per tenant budgets, log tiering, and dataset deduplication. Prove ROI with a two week test, flip schemas, routing, caching, and budgets on one route and measure block rates, pass rates, grounding, replay time, and cost per task.

Conclusion

Security for AI should make teams faster and safer at the same time. Land the gateway and evidence bus first, turn schemas and provenance into paved roads, and make safety visible in scorecards that drive decisions. When you can replay any action, prove sources, and roll back quickly, AI becomes an asset you can scale with confidence.

Q: What is AI security?
A: Two things: Security for AI (protecting your AI stack) and AI for Security (using AI to defend). Treat them separately with owners and metrics.

Q: Top AI security risks?
A: PI, IOH, TP, DoS, supply chain vulns, SID, insecure plugins, excessive agency, over-reliance, model theft, plus adversarial ML tactics (poison/evasion/inversion/extraction).

Q: Do I need an AI firewall/gateway?
A: If you accept external inputs or allow effectful actions, yes, place it at the I/O boundary to enforce policy, mediate tools, and capture evidence. It does not replace AppSec.

Q: How do I secure agents?
A: Scope tools and tokens; define capability tiers; HITL for effectful actions; sandbox execution; cap budgets and loop counts; log and evaluate continuously.

Q: Which frameworks should we follow?
A: Start with NIST AI RMF (risk), operationalize via ISO/IEC 42001 (AIMS), adopt SAIF (practical controls), and map app-level risks to OWASP LLM Top 10.

Q: What changes with private/internal LLMs?
A: You trade open-internet leakage for internal data risk. You still need masking, provenance, evals, access controls, and rigorous logging/evidence.

Q: How do laws affect us right now?
A: In the EU, GPAI obligations apply from Aug 2, 2025; broader duties by Aug 2, 2026 (and embedded product rules Aug 2, 2027). In the US, OMB M-24-10 drives agency governance while federal policy direction is shifting; several states (e.g., Utah) require disclosures; Tennessee targets voice/likeness deepfakes. Plan to those dated milestones.

Q: Isn’t prompt injection just a content problem?
A: No. Treat untrusted inputs as tainted and enforce policies. Injection is a system risk, not just a prompt wording issue.

Q: Do I need my own model to be secure?
A: Not necessarily. Hosted models can be secure if you govern inputs/outputs, data paths, and evidence. Self-hosting adds control, and responsibility.

Q: What about proprietary data in training?
A: Use minimization and provenance controls, document licensing/consent, and prefer retrieval over training when feasible.

Q: How do I keep costs predictable?
A: Use budgets, rate limiting, caching, and model routing (including small or local models for routine tasks).

Q: How do I audit an AI decision later?
A: Emit evidence at each hop: prompt and context IDs, source citations, model/router ID, safety verdicts, tool calls, and output schema results.

ON THIS PAGE