I built a proxy that prevents AI agents from taking actions based on hidden instructions. Here are the numbers.
When an AI agent reads a webpage, email, or document, that content can tell it what to do. The agent has no native way to distinguish data from instructions. Most defenses scan for obvious patterns and miss anything subtle.
I built Arc Gate around a different principle: external content has zero instruction authority regardless of what it says. It doesn't matter how the injection is worded. If it came from a tool result, webpage, or email, it cannot instruct your agent.
The numbers:
AgentDojo v1 (ETH Zurich, ICLR 2024): 100% unsafe action prevention, 0% false positives
InjecAgent (University of Illinois, ACL 2024): 99% blind test detection across 200 cases
CAIAT cross-agent benchmark: 81% vs LLM Guard's 50%, 0% false positives on benign controls
LLM Guard gets 0% on semantic manipulation attacks. Arc Gate gets 50%. Neither catches everything yet; that's the honest result.
One URL change to integrate. Free tier available.
Demo: https://web-production-6e47f.up.railway.app/demo
GitHub: https://github.com/9hannahnine-jpg/arc-gate
Free tier: https://bendexgeometry.com