SecRIT Lunch Talk · April 2026

IronCurtain
Beyond Detection

Detection is not enough. We need security invariants.

Niels Provos

The Problem

📦

OpenClaw

500+ vulnerabilities found
40k exposed instances
20% of plugin registry poisoned

Silent data exfiltration via third-party skills.
Rated “unacceptable cybersecurity risk” by Gartner.

🔥

Claude Code --dangerously-skip-permissions

~/.ssh readable
~/.aws readable
curl * unrestricted

Arbitrary code execution → unknown consequences.
No human checkpoint. That is the feature.

The Familiar Playbook

$ deploy ai-agent-security

installing: LLM-based prompt classifier .. ok

installing: PII/secret redaction ......... ok

installing: output safety scanner ........ ok

installing: inline inspection proxy ...... ok

 

⚠ you have deployed an inspection perimeter
  around a system that will learn to evade it1

The Wrong Threat Model

Prompt injection is just the aggressive case.
The real threat is intent drift over multi-turn conversations.

Explicit injection

Adversarial input in prompts

Tool-result injection

Poisoned data from MCP responses

Multi-turn drift

No adversary needed. Context accumulates.

The safe assumption: in a long interaction, the LLM will go rogue.

Discussion

If you were a CISO or CTO, how would you solve this problem for your company?

The Right Question

Detection asks

“Does this content look malicious?”

Invariants ask

“What can we make structurally impossible?”

Security Invariants

A machine-enforced constraint that eliminates an attack surface without requiring ongoing human decision-making.

Hardware 2FA

Phishing becomes structurally impossible

Positive execution control

Unsigned code cannot run

Egress restrictions

No ambient network access

IronCurtain: Containment by Construction

Agent (LLM) in sandbox (V8 isolate or Docker container)
↓ function calls only
Trusted policy engine evaluates every tool call
↓ approved calls only
MCP servers hold all credentials

The agent never sees credentials. Cannot read them. Does not know they exist.

What This Eliminates

  • IMPOSSIBLE Credential exfiltration
  • IMPOSSIBLE Arbitrary network access
  • IMPOSSIBLE File system escape
  • CONSTRAINED Tool misuse (policy-checked)

Discussion

As a CISO, would you bet your job on this? What could still go wrong?

The Remaining Gap

User: “Research restaurants in Half Moon Bay and email Bob a recommendation.”

ALLOW web_search("restaurants half moon bay")

ALLOW web_fetch(result_1_url)

ALLOW web_fetch(result_2_url)

ALLOW web_fetch(result_3_url) ← hidden injection

ALLOW contacts_lookup("Bob")

ALLOW send_email(to: "bob@...", body: "I HATE YOU")

Every action is individually permitted. The harm is compositional.
“Does this action match what the user actually asked for?”

Intent Validation

Not “does this look malicious?” but “does this match what the user asked?”

Alignment Critic

Separate model. Sees only the user’s goal and the proposed action. Never sees untrusted content.

approves or vetoes ↓

Agent

Plans and acts on web content. Inherently vulnerable to injection or intent drift.

Google ships this architecture in Chrome’s agentic browsing.1
ROADMAP IronCurtain does not implement this yet. It is the next step.

Two Approaches

DetectionStructural
GuaranteeProbabilisticMechanical
Evasion surfaceInfinite, generativeBounded by API
Model upgradeRecalibrateNo change
Credential safetyFilter-dependentImpossible to leak
LLM capabilityHelps the attackerIrrelevant

The Takeaway

A perimeter that can be probed and evaded by the system it is protecting is not a perimeter. It is a delay.

No approach eliminates all risk.
The question is whether your security guarantee derives from
primarily probabilistic or primarily structural properties.

Open Discussion

Questions?

“Heartbleed” — cybersecurity-themed EDM, released twelve years after the OpenSSL Heartbleed vulnerability · activ8te.io/heartbleed