Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in OpenClaw. DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth. It combines risk-conditioned evaluation, multi-objective trajectory scoring, reward-guided beam search, and reflection-based deep probing to identify high-value compromised contexts. We construct a 42-case benchmark spanning six vulnerability classes and seven operational scenarios, and evaluate nine target models using attack and utility grading scores. Results show that contextual compromise can induce substantial unsafe behavior while preserving user-facing task completion, demonstrating that final-response evaluation is insufficient. The findings highlight the need for execution-centric security evaluation of agentic AI systems. Our code is released at: https://github.com/ZJUICSR/DeepTrap
翻译:基于语言模型的智能体系统越来越依赖可变的执行上下文,包括文件、内存、工具、技能和辅助工件,这带来了超越显式用户提示的安全风险。本文提出DeepTrap,一种自动发现OpenClaw中上下文漏洞的框架。DeepTrap将对抗性上下文操纵形式化为黑盒轨迹级优化问题,平衡风险实现、良性任务保持和隐蔽性。它结合了风险条件评估、多目标轨迹评分、奖励引导的束搜索和基于反思的深度探测来识别高价值的受损上下文。我们构建了一个涵盖六类漏洞和七个操作场景的42例基准测试,并使用攻击评分和效用评分评估了九个目标模型。结果表明,上下文妥协可以在保持面向用户任务完成的同时诱发大量不安全行为,证明最终响应评估是不充分的。这些发现凸显了对智能体AI系统进行以执行为中心的安全性评估的必要性。我们的代码发布在:https://github.com/ZJUICSR/DeepTrap