Robotic manipulation refers to the autonomous handling and interaction of robots with objects using advanced techniques in robotics and artificial intelligence. The advent of powerful tools such as large language models (LLMs) and large vision-language models (LVLMs) has significantly enhanced the capabilities of these robots in environmental perception and decision-making. However, the introduction of these intelligent agents has led to security threats such as jailbreak attacks and adversarial attacks. In this research, we take a further step by proposing a backdoor attack specifically targeting robotic manipulation and, for the first time, implementing backdoor attack in the physical world. By embedding a backdoor visual language model into the visual perception module within the robotic system, we successfully mislead the robotic arm's operation in the physical world, given the presence of common items as triggers. Experimental evaluations in the physical world demonstrate the effectiveness of the proposed backdoor attack.
翻译:机器人操控是指机器人利用机器人学与人工智能领域的先进技术,自主处理物体并与之交互。大型语言模型(LLMs)和大型视觉语言模型(LVLMs)等强大工具的出现,显著增强了这些机器人在环境感知与决策方面的能力。然而,这些智能体的引入也带来了越狱攻击和对抗性攻击等安全威胁。在本研究中,我们进一步提出了一种专门针对机器人操控的后门攻击,并首次在物理世界中实现了后门攻击。通过将后门视觉语言模型嵌入机器人系统的视觉感知模块,我们在以常见物品作为触发器的情况下,成功误导了物理世界中机械臂的操作。物理世界中的实验评估证明了所提后门攻击的有效性。