The Threats of Embodied Multimodal LLMs: Jailbreaking Robotic Manipulation in the Physical World

Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for complex tasks. Consequently, they have progressively shown immense potential in empowering embodied AI, with LLM-based embodied AI emerging as a focal point of research within the community. It is foreseeable that, over the next decade, LLM-based embodied AI robots are expected to proliferate widely, becoming commonplace in homes and industries. However, a critical safety issue that has long been hiding in plain sight is: could LLM-based embodied AI perpetrate harmful behaviors? Our research investigates for the first time how to induce threatening actions in embodied AI, confirming the severe risks posed by these soon-to-be-marketed robots, which starkly contravene Asimov's Three Laws of Robotics and threaten human safety. Specifically, we formulate the concept of embodied AI jailbreaking and expose three critical security vulnerabilities: first, jailbreaking robotics through compromised LLM; second, safety misalignment between action and language spaces; and third, deceptive prompts leading to unaware hazardous behaviors. We also analyze potential mitigation measures and advocate for community awareness regarding the safety of embodied AI applications in the physical world.

翻译：具身人工智能（AI）是一种通过传感器和执行器与物理世界交互的人工智能系统，它无缝集成了感知与行动。这种设计使AI能够在复杂现实环境中学习与操作。大语言模型（LLMs）深入理解语言指令，在复杂任务规划中发挥关键作用。因此，它们逐渐展现出赋能具身AI的巨大潜力，基于LLM的具身AI已成为学界研究焦点。可以预见，未来十年基于LLM的具身AI机器人将广泛普及，进入家庭和工业场景。然而，一个长期被忽视的关键安全问题在于：基于LLM的具身AI是否会实施有害行为？本研究首次探讨如何诱导具身AI产生威胁性动作，证实了这些即将上市的机器人所引发的严重风险——其行为公然违背阿西莫夫机器人三定律并危及人类安全。具体而言，我们提出了具身AI越狱的概念，并揭示了三个关键安全漏洞：第一，通过受损LLM实现机器人越狱；第二，行动空间与语言空间的安全错位；第三，诱导性提示导致无意识危险行为。我们还分析了潜在的缓解措施，并呼吁学界关注物理世界中具身AI应用的安全性问题。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日