Exploring the Robustness of Decision-Level Through Adversarial Attacks on LLM-Based Embodied Models

Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models.

翻译：具身智能赋予智能体深刻的感知能力，使其能够以高度贴合现实情境的方式做出响应。大语言模型（LLMs）深入解析语言指令，在复杂任务规划生成中发挥着关键作用。因此，基于LLM的具身模型进一步增强了智能体理解和处理信息的能力。然而，这种融合也带来了追求更高智能过程中的新挑战。具体而言，攻击者可通过篡改提示词操控LLM，使其产生无关甚至恶意的输出。面对这一挑战，我们注意到目前严重缺乏全面评估基于LLM的具身模型鲁棒性所必需的多模态数据集。为此，我们构建了专门用于鲁棒性评估的具身智能机器人攻击数据集（EIRAD）。此外，我们设计了两种攻击策略，包括无目标攻击与有目标攻击，以有效模拟多样化的攻击场景。同时，在攻击过程中，为更精确地判定我们的方法是否成功攻击了基于LLM的具身模型，我们设计了一种利用BLIP2模型的新型攻击成功率评估方法。考虑到GCG算法在攻击中耗时且成本高昂，我们提出了一种基于不同目标任务的提示后缀初始化方案，从而加速收敛过程。实验结果表明，我们的方法在针对基于LLM的具身模型时展现出更高的攻击成功率，表明此类模型在决策层面的鲁棒性水平较低。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/