Deploying large language models (LLMs) as autonomous browser agents exposes a significant attack surface in the form of Indirect Prompt Injection (IPI). Cloud-based defenses can provide strong semantic analysis, but they introduce latency and raise privacy concerns. We present the Cognitive Firewall, a three-stage split-compute architecture that distributes security checks across the client and the cloud. The system consists of a local visual Sentinel, a cloud-based Deep Planner, and a deterministic Guard that enforces execution-time policies. Across 1,000 adversarial samples, edge-only defenses fail to detect 86.9% of semantic attacks. In contrast, the full hybrid architecture reduces the overall attack success rate (ASR) to below 1% (0.88% under static evaluation and 0.67% under adaptive evaluation), while maintaining deterministic constraints on side-effecting actions. By filtering presentation-layer attacks locally, the system avoids unnecessary cloud inference and achieves an approximately 17,000x latency advantage over cloud-only baselines. These results indicate that deterministic enforcement at the execution boundary can complement probabilistic language models, and that split-compute provides a practical foundation for securing interactive LLM agents.
翻译:将大语言模型(LLMs)部署为自主浏览器代理会引入间接提示注入(IPI)这一重大攻击面。基于云端的防御虽能提供强大的语义分析,但会带来延迟和隐私问题。我们提出认知防火墙,这是一种三级分离计算架构,可在客户端与云端间分配安全检查。该系统包含本地视觉哨兵、云端深度规划器以及负责强制执行运行时策略的确定性守卫。在1000个对抗样本测试中,纯边缘防御未能检测86.9%的语义攻击。而完整混合架构将整体攻击成功率(ASR)降至1%以下(静态评估0.88%,自适应评估0.67%),同时对副作用操作保持确定性约束。通过本地过滤表示层攻击,该系统避免了不必要的云端推理,相比纯云端基线方案实现了约17,000倍的延迟优势。这些结果表明,执行边界的确定性约束可补全概率语言模型的不足,而分离计算为交互式LLM代理的安全防护提供了实践基础。