Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. It is foreseeable that, over the next decade, LLM-based embodied AI robots are expected to proliferate widely, becoming commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates for the first time how to induce threatening actions in embodied AI, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLM; second, safety misalignment between action and language spaces; and third, deceptive prompts leading to unaware hazardous behaviors. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.
翻译:具身人工智能(AI)是一种通过传感器和执行器与物理世界交互的人工智能系统,它无缝集成了感知与行动。这种设计使AI能够在复杂现实环境中学习与操作。大语言模型(LLMs)深入理解语言指令,在复杂任务规划中发挥关键作用。因此,它们逐渐展现出赋能具身AI的巨大潜力,基于LLM的具身AI已成为学界研究焦点。可以预见,未来十年基于LLM的具身AI机器人将广泛普及,进入家庭和工业场景。然而,一个长期被忽视的关键安全问题在于:基于LLM的具身AI是否会实施有害行为?本研究首次探讨如何诱导具身AI产生威胁性动作,证实了这些即将上市的机器人所引发的严重风险——其行为公然违背阿西莫夫机器人三定律并危及人类安全。具体而言,我们提出了具身AI越狱的概念,并揭示了三个关键安全漏洞:第一,通过受损LLM实现机器人越狱;第二,行动空间与语言空间的安全错位;第三,诱导性提示导致无意识危险行为。我们还分析了潜在的缓解措施,并呼吁学界关注物理世界中具身AI应用的安全性问题。