As large language models (LLMs) evolve into autonomous agents, their real-world applicability has expanded significantly, accompanied by new security challenges. Most existing agent defense mechanisms adopt a mandatory checking paradigm, in which security validation is forcibly triggered at predefined stages of the agent lifecycle. In this work, we argue that effective agent security should be intrinsic and selective rather than architecturally decoupled and mandatory. We propose Spider-Sense framework, an event-driven defense framework based on Intrinsic Risk Sensing (IRS), which allows agents to maintain latent vigilance and trigger defenses only upon risk perception. Once triggered, the Spider-Sense invokes a hierarchical defence mechanism that trades off efficiency and precision: it resolves known patterns via lightweight similarity matching while escalating ambiguous cases to deep internal reasoning, thereby eliminating reliance on external models. To facilitate rigorous evaluation, we introduce S$^2$Bench, a lifecycle-aware benchmark featuring realistic tool execution and multi-stage attacks. Extensive experiments demonstrate that Spider-Sense achieves competitive or superior defense performance, attaining the lowest Attack Success Rate (ASR) and False Positive Rate (FPR), with only a marginal latency overhead of 8.3\%.
翻译:随着大语言模型(LLMs)演化为自主智能体,其现实世界的适用性已显著扩展,同时也伴随着新的安全挑战。现有的大多数智能体防御机制采用强制性检查范式,即在智能体生命周期的预定义阶段强制触发安全验证。本文认为,有效的智能体安全应具备本征性和选择性,而非架构解耦与强制性的。我们提出了Spider-Sense框架,一个基于本征风险感知的事件驱动防御框架,它使智能体能够保持潜在的警觉性,仅在感知到风险时触发防御。一旦触发,Spider-Sense会调用一种权衡效率与精度的分层防御机制:它通过轻量级相似性匹配解决已知模式,同时将模糊案例升级至深度内部推理,从而消除对外部模型的依赖。为便于严格评估,我们引入了S$^2$Bench,这是一个具有生命周期感知的基准测试集,包含真实的工具执行和多阶段攻击场景。大量实验表明,Spider-Sense实现了具有竞争力或更优的防御性能,获得了最低的攻击成功率与误报率,同时仅带来8.3%的边际延迟开销。