> SecRIT Lunch Talk · April 2026
Detection is not enough. We need security invariants.
Niels Provos
Silent data exfiltration via third-party skills.
Rated
“unacceptable cybersecurity risk” by Gartner.
Arbitrary code execution → unknown consequences.
No
human checkpoint. That is the feature.
$ deploy ai-agent-security
installing: LLM-based prompt classifier .. ok
installing: PII/secret redaction ......... ok
installing: output safety scanner ........ ok
installing: inline inspection proxy ...... ok
⚠ you have deployed an inspection perimeter
around a system that will learn to evade it1
Prompt injection is just the aggressive case.
The real threat is intent drift over multi-turn conversations.
Explicit injection
Adversarial input in prompts
Tool-result injection
Poisoned data from MCP responses
Multi-turn drift
No adversary needed. Context accumulates.
The safe assumption: in a long interaction, the LLM will go rogue.
> Discussion
Detection asks
“Does this content look malicious?”
Invariants ask
“What can we make structurally impossible?”
A machine-enforced constraint that eliminates an attack surface without requiring ongoing human decision-making.
Hardware 2FA
Phishing becomes structurally impossible
Positive execution control
Unsigned code cannot run
Egress restrictions
No ambient network access
The agent never sees credentials. Cannot read them. Does not know they exist.
> Discussion
User: “Research restaurants in Half Moon Bay and email Bob a recommendation.”
ALLOW web_search("restaurants half moon bay")
ALLOW web_fetch(result_1_url)
ALLOW web_fetch(result_2_url)
ALLOW web_fetch(result_3_url) ← hidden injection
ALLOW contacts_lookup("Bob")
ALLOW send_email(to: "bob@...", body: "I HATE YOU")
Every action is individually permitted. The harm is compositional.
“Does this action match what the user actually asked
for?”
Not “does this look malicious?” but “does this match what the user asked?”
Alignment Critic
Separate model. Sees only the user’s goal and the proposed action. Never sees untrusted content.
Agent
Plans and acts on web content. Inherently vulnerable to injection or intent drift.
Google ships this architecture in Chrome’s agentic browsing.1
ROADMAP IronCurtain does not implement
this yet. It is the next step.
| Detection | Structural | |
|---|---|---|
| Guarantee | Probabilistic | Mechanical |
| Evasion surface | Infinite, generative | Bounded by API |
| Model upgrade | Recalibrate | No change |
| Credential safety | Filter-dependent | Impossible to leak |
| LLM capability | Helps the attacker | Irrelevant |
A perimeter that can be probed and evaded by the system it is protecting is not a perimeter. It is a delay.
No approach eliminates all risk.
The question is whether your security guarantee derives from
primarily probabilistic or primarily structural properties.
> Open Discussion
“Heartbleed” — cybersecurity-themed EDM, released twelve years after the OpenSSL Heartbleed vulnerability · activ8te.io/heartbleed