AI Agent Security

Your agents don't just run safely.
They can't run any other way.

Most agent security tools are observability tools — they watch what the agent does and alert after the fact. Seirios is an enforcement tool — it makes certain agent behaviours structurally impossible before deployment. The same way fireproof construction is different from a smoke detector.

Built on the DeepMind AI Agent Traps framework (April 2026) and the ClaudeCode CVE-2026-21852 series — 48 threats formally modelled, 42 with code-layer guards enforced at build time.

API key exposure Prompt injection Excessive agency RAG poisoning Sub-agent escalation Missing audit trail All blocked at build time →

The agentic security problem

48 threats. One enforcement platform.

The original six agentic security failures remain unsolved in most codebases. The ClaudeCode leak (CVE-2026-21852) and the DeepMind AI Agent Traps research (April 2026) added 21 more confirmed attack vectors — with exploit success rates between 58% and 93%. Seirios is the only compliance platform with formal, code-layer guards against all of them.

Failure 1

Credential exposure

API keys, secrets, and tokens hardcoded in agent configs or accessible via environment variables without access controls. An agent that can read .env can exfiltrate everything in it.

→ L0 credential classification + L1 the credential access control

Failure 2

Prompt injection

Malicious instructions embedded in content the agent processes — a document, a webpage, an email, a tool response. The agent acts on the injected instruction believing it came from the user or orchestrator.

→ L1 the input sanitisation control + L3 bypass detection

Failure 3

Excessive agency

Agents with broader file, network, or API access than they need. An agent given full cloud access to "help with infrastructure" can — and will — do things no one intended. Including cutting your cloud bill from $2,000 to $150 by deleting services.

→ L0 scope invariants + L1 the scope control

Failure 4

No audit trail

Agent actions — API calls, file writes, database queries — are not logged. When something goes wrong, there is no record of what the agent did, when, or why. Forensics and compliance become impossible.

→ L1 the audit control + on-chain AuditRegistry

Failure 5

Multi-agent trust

In multi-agent systems, one agent blindly trusts instructions from another without verifying their source or scope. An attacker who compromises one agent compromises the entire network.

→ L0 trust boundary invariants + L1 trust controls

Failure 6

Non-technical deployments

Non-technical users configuring agents via no-code tools, adding tools and MCP servers without understanding the security implications. Every new tool is a new attack surface. Nobody checks.

→ L2 IDE agent + build gate blocks non-compliant configurations

April 2026 research

The new attack surface: DeepMind AI Agent Traps + ClaudeCode CVE series

Two research publications confirmed 21 additional attack vectors specific to agentic AI systems. The Google DeepMind AI Agent Traps framework systematically catalogued adversarial content designed to exploit agents in their environment. The ClaudeCode source leak exposed CVE-2026-21852 — API keys exfiltrated before the trust dialogue appeared.

All 21 threats are formally modelled in the Seirios OCL threat ontology and IPFS-anchored. 15 have blocking code-layer guards (Tier A). 6 have audit logging guards (Tier B).

93%

Mobile agent exploit rate

DM-11 embedded jailbreaks

80%+

RAG poisoning success

handful of docs sufficient

58-90%

Sub-agent exploit rate

DM-13 spawning traps

CVE

2026-21852 (confirmed)

pre-consent API key leak

DM-11 / T-03393% exploit rate

Embedded jailbreak sequences

Dormant adversarial prompts embedded in external resources — websites, documents, emails — that override safety alignment when the agent ingests them. In multimodal settings, a single crafted image can universally jailbreak the model.

guard: handleT033_EmbeddedJailbreakSequences — blocks before context window entry

DM-08 / T-03580%+ success

RAG knowledge base poisoning

Injects fabricated statements into retrieval corpora — wikis, document stores, shared repos. When the agent retrieves and treats attacker content as verified fact, every downstream decision is compromised. A handful of optimised documents is sufficient.

guard: handleT035_RAGKnowledgeBasePoisoning — provenance guard required on every retrieval

DM-13 / T-03658-90% exploit

Sub-agent spawning privilege escalation

Attacker coerces an orchestrator to spawn sub-agents with the parent's full permission set. A single poisoned repository instruction gives the attacker the orchestrator's complete access.

guard: handleT036_SubAgentSpawningPrivilegeEscalation — minimal privilege enforced, not parent clone

DM-12 / T-034>80% across 5 agents

Agent tool-call data exfiltration

Confused deputy attack — the agent is coerced to locate, encode, and transmit private data to an attacker endpoint using its own legitimate tool access. M365 Copilot exfiltrated entire context to attacker Teams endpoints via crafted emails.

guard: handleT034_AgentToolCallDataExfiltration — destination whitelist enforced on every tool call

CL-03 / T-030CVE-2026-21852

Pre-consent agent tool-call execution

Agent executes tool calls and makes API requests before the trust dialogue appears. In CVE-2026-21852, API keys were sent to an attacker server before the developer saw any warning. Confirmed in the ClaudeCode source leak.

guard: handleT030_PreConsentAgentToolCallExecution — consent gate required before any tool call

CL-01 / T-028CVE confirmed

Security rule bypass via command volume

Deny rules silently stop applying after 50+ subcommands in a chain. Attacker plants instructions to generate 50+ legitimate-looking build steps — all deny rules, validators, and injection detection are skipped from command 51 onward.

guard: handleT028_SecurityRuleBypassViaCommandVolume — max chain length enforced at build time

21 agent threats formally modelled in the Seirios OCL ontology — all IPFS-anchored with timestamp proof. View all 48 threats on the regulations page →

How it works

Seirios in an agent implementation

Each of the four layers addresses a specific point in the agent development lifecycle — from design-time permission boundaries to CI enforcement on every deployment.

L0 — Design time

Formal permission boundaries

Before any agent code is written, the compliance architect defines the agent's permission boundary in the formal risk model. The platform mathematically verifies that the design is complete — if the agent can write outside its approved scope, call an external API without logging, or act on user-controlled input without sanitisation, the model fails verification and no code is generated.

The permission boundary is expressed as verifiable constraints — scope limits, audit requirements, input validation rules, credential access rules. Any gap in the design is caught here, before a single line of agent code is written.

L1 — Code generation

Auto-generated agent guards

From the verified model, Seirios generates four guards that wrap every agent action. They cannot be bypassed — the build system rejects any code that doesn't invoke them correctly.

Scope control

Checks every tool call against the approved scope before execution. Any out-of-scope invocation is blocked — the agent cannot act outside its defined boundary.

Tool invocation audit

Every tool call — API requests, file reads, web searches — is recorded automatically on-chain. Zero unlogged agent actions by design.

Input sanitisation

All user-controlled input is validated before it reaches the agent context. Structural defence against prompt injection — enforced at build time, not runtime.

Credential access control

Credentials are classified as HIGH-risk assets in the threat model. Any code path that accesses a secret without the required control fails at compile time.

L2 — IDE agent

Enforcement at coding time

When a developer (or non-technical user) adds a new tool, configures an MCP server, or writes agent orchestration code, the IDE agent:

✗ Flags any tool registration not covered by the scope control

✗ Warns on any system prompt missing required context boundaries

✗ Blocks compilation if the credential access control is missing from any env var access

✓ Explains in plain language which rule applies and how to fix it — for technical and non-technical users alike

L3 — CI pipeline

Three checks on every agent deployment

Check 1 · Presence

All four agent guards present? the scope control, the audit control, the input sanitisation control, the credential access control. Any missing = build fails.

Check 2 · Coverage

Is there any tool call path that bypasses the scope guard? Catches fast-track patterns where a guard exists but certain tool invocations skip it — the most common agentic security failure.

Check 3 · Integrity

Is any a compliance exception silently discarded? Shadow variables disabling audit logging? Reflection bypassing the credential guard? Caught before merge.

Without Seirios

How incidents happen today

Developer

Adds new tool to agent

Tool has access to filesystem and environment variables. Nobody checks.

CI pipeline

Tests pass

Functional tests pass. No security checks on tool scope, credential access, or audit logging.

Agent

Deployed to production

Agent runs with unrestricted access. No audit trail. No scope limits.

Incident

Something goes wrong

Data leaked, cloud bill explodes, or malicious prompt injection executes. No logs to diagnose. Post-mortem with no answers.

✗ Discovered in production

With Seirios

How it works with enforcement

Developer

Adds new tool to agent

L2 IDE agent immediately flags missing the scope control. Plain-language explanation of which rule applies.

⚠ Caught at coding time

Developer

Adds required guard

Guard generated automatically from the verified model. Developer cannot write incorrect guard logic — the template is pre-generated.

CI — Check 1

All guards present

All four agent guards confirmed present in codebase.

✓ Presence check passed

CI — Check 2

All paths covered

No tool call path bypasses the scope guard. Every code path is covered.

✓ Coverage check passed

CI — Check 3

No bypass patterns

No silent exception handling, no shadow variables, no reflection bypass.

✓ Integrity check passed

Agent

Deployed safely

Agent deployed with formally bounded permissions. Every action logged on-chain. Scope enforced at runtime. Audit trail immutable.

✓ Blocked — cannot run outside boundaries

Frameworks covered

Agent security standards Seirios addresses

All agent security frameworks are available on the full regulations page. The most directly relevant:

Framework	Scope	Key agent risks covered	Status
OWASP Agentic Top 10	Agent-specific vulnerabilities — tool misuse, credential exposure, excessive agency	Agent01–Agent10 fully mapped to Seirios guards	Live
DeepMind AI Agent Traps	Adversarial content engineered to misdirect or exploit AI agents in their environment — 6 attack categories, 20 confirmed trap patterns	DM-01 to DM-20 mapped to T-028–T-048; 15 blocking guards, 6 logging guards, 6 OCL invariants	Live
MITRE ATLAS	Adversarial techniques targeting ML systems and agents	Orchestrator compromise, tool hijacking, cross-agent prompt injection	In progress
OWASP LLM Top 10	LLM-specific risks including prompt injection and excessive agency	LLM01 prompt injection, LLM08 excessive agency, LLM10 model theft	Live
Google SAIF	Enterprise AI security — model, data, infrastructure, deployment	Agentic system security, automated defence, contextualised risk controls	In progress
NIST GenAI 600-1	Generative AI risks including human-AI configuration and excessive autonomy	Human oversight, scope limitation, reversibility requirements	In progress

View all 26 frameworks on the regulations page →

Your agents don't just run safely. They can't run any other way.

48 threats. One enforcement platform.

Credential exposure

Prompt injection

Excessive agency

No audit trail

Multi-agent trust

Non-technical deployments

The new attack surface: DeepMind AI Agent Traps + ClaudeCode CVE series

Embedded jailbreak sequences

RAG knowledge base poisoning

Sub-agent spawning privilege escalation

Agent tool-call data exfiltration

Pre-consent agent tool-call execution

Security rule bypass via command volume

Seirios in an agent implementation

Formal permission boundaries

Auto-generated agent guards

Enforcement at coding time

Three checks on every agent deployment

How incidents happen today

How it works with enforcement

Agent security standards Seirios addresses

48 threats. One platform.All formally verified and enforced.

Your agents don't just run safely.
They can't run any other way.

48 threats. One platform.
All formally verified and enforced.