Embodied intelligence empowers agents with a profound sense of perception, enabling them to respond in a manner closely aligned with real-world situations. Large Language Models (LLMs) delve into language instructions with depth, serving a crucial role in generating plans for intricate tasks. Thus, LLM-based embodied models further enhance the agent's capacity to comprehend and process information. However, this amalgamation also ushers in new challenges in the pursuit of heightened intelligence. Specifically, attackers can manipulate LLMs to produce irrelevant or even malicious outputs by altering their prompts. Confronted with this challenge, we observe a notable absence of multi-modal datasets essential for comprehensively evaluating the robustness of LLM-based embodied models. Consequently, we construct the Embodied Intelligent Robot Attack Dataset (EIRAD), tailored specifically for robustness evaluation. Additionally, two attack strategies are devised, including untargeted attacks and targeted attacks, to effectively simulate a range of diverse attack scenarios. At the same time, during the attack process, to more accurately ascertain whether our method is successful in attacking the LLM-based embodied model, we devise a new attack success evaluation method utilizing the BLIP2 model. Recognizing the time and cost-intensive nature of the GCG algorithm in attacks, we devise a scheme for prompt suffix initialization based on various target tasks, thus expediting the convergence process. Experimental results demonstrate that our method exhibits a superior attack success rate when targeting LLM-based embodied models, indicating a lower level of decision-level robustness in these models.
翻译:具身智能赋予智能体深刻的感知能力,使其能够以高度贴合现实情境的方式做出响应。大语言模型(LLMs)深入解析语言指令,在复杂任务规划生成中发挥着关键作用。因此,基于LLM的具身模型进一步增强了智能体理解和处理信息的能力。然而,这种融合也带来了追求更高智能过程中的新挑战。具体而言,攻击者可通过篡改提示词操控LLM,使其产生无关甚至恶意的输出。面对这一挑战,我们注意到目前严重缺乏全面评估基于LLM的具身模型鲁棒性所必需的多模态数据集。为此,我们构建了专门用于鲁棒性评估的具身智能机器人攻击数据集(EIRAD)。此外,我们设计了两种攻击策略,包括无目标攻击与有目标攻击,以有效模拟多样化的攻击场景。同时,在攻击过程中,为更精确地判定我们的方法是否成功攻击了基于LLM的具身模型,我们设计了一种利用BLIP2模型的新型攻击成功率评估方法。考虑到GCG算法在攻击中耗时且成本高昂,我们提出了一种基于不同目标任务的提示后缀初始化方案,从而加速收敛过程。实验结果表明,我们的方法在针对基于LLM的具身模型时展现出更高的攻击成功率,表明此类模型在决策层面的鲁棒性水平较低。