In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplored yet extremely critical. LLMs and VLMs are highly susceptible to adversarial inputs, prompting a significant inquiry into the safety of robotic systems. This concern is important because robotics operate in the physical world where erroneous actions can result in severe consequences. This paper explores this issue thoroughly, presenting a mathematical formulation of potential attacks on LLM/VLM-based robotic systems and offering experimental evidence of the safety challenges. Our empirical findings highlight a significant vulnerability: simple modifications to the input can drastically reduce system effectiveness. Specifically, our results demonstrate an average performance deterioration of 19.4% under minor input prompt modifications and a more alarming 29.1% under slight perceptual changes. These findings underscore the urgent need for robust countermeasures to ensure the safe and reliable deployment of advanced LLM/VLM-based robotic systems.
翻译:本文重点探讨了将大型语言模型(LLM)和视觉语言模型(VLM)集成到机器人应用中所带来的鲁棒性与安全性关键问题。近期研究主要集中于利用LLM和VLM提升机器人任务(如操作与导航)的性能。尽管取得了这些进展,对此类系统安全性的分析仍显不足,却至关重要。LLM和VLM极易受到对抗性输入的影响,这引发了对机器人系统安全性的重大关切。该问题之所以重要,是因为机器人在物理世界中运行,其错误行为可能导致严重后果。本文深入探讨了这一问题,提出了针对基于LLM/VLM的机器人系统的潜在攻击的数学表述,并提供了安全挑战的实验证据。我们的实证研究结果揭示了一个显著的脆弱性:对输入进行简单修改即可大幅降低系统效能。具体而言,我们的结果表明,在轻微的输入提示修改下,系统性能平均下降19.4%;而在微小的感知变化下,性能下降更为显著,达到29.1%。这些发现强调了迫切需要采取鲁棒的应对措施,以确保基于先进LLM/VLM的机器人系统能够安全可靠地部署。