Large language models (LLMs) are widely used for task understanding and action planning in embodied intelligence (EI) systems, but their adoption substantially increases vulnerability to jailbreak attacks. While recent work explores inference-time defenses, existing methods rely on static interventions on intermediate representations, which often degrade generation quality and impair adherence to task instructions, reducing system usability in EI settings. We propose a dynamic defense framework. For each EI inference request, we dynamically construct a task-specific safety-semantic subspace, project its hidden state to the most relevant direction, and apply SLERP rotation for adaptive safety control. At comparable defense success rates, our method preserves generation quality, improves usability, reduces tuning cost, and strengthens robustness in EI scenarios.
翻译:大型语言模型(LLMs)在具身智能(EI)系统中被广泛用于任务理解和行动规划,但其应用也显著增加了系统遭受越狱攻击的脆弱性。尽管近期研究探索了推理时防御方法,但现有方法依赖于对中间表示的静态干预,这通常会降低生成质量并损害对任务指令的遵循,从而削弱了具身智能场景下的系统可用性。我们提出了一种动态防御框架。针对每个具身智能推理请求,我们动态构建一个任务特定的安全-语义子空间,将其隐藏状态投影至最相关的方向,并应用SLERP旋转以实现自适应安全控制。在防御成功率相当的情况下,我们的方法能够保持生成质量、提升可用性、降低调优成本,并增强具身智能场景下的鲁棒性。