From Prompt to Physical Action: Structured Backdoor Attacks on LLM-Mediated Robotic Control Systems

The integration of large language models (LLMs) into robotic control pipelines enables natural language interfaces that translate user prompts into executable commands. However, this digital-to-physical interface introduces a critical and underexplored vulnerability: structured backdoor attacks embedded during fine-tuning. In this work, we experimentally investigate LoRA-based supply-chain backdoors in LLM-mediated ROS2 robotic control systems and evaluate their impact on physical robot execution. We construct two poisoned fine-tuning strategies targeting different stages of the command generation pipeline and reveal a key systems-level insight: back-doors embedded at the natural-language reasoning stage do not reliably propagate to executable control outputs, whereas backdoors aligned directly with structured JSON command formats successfully survive translation and trigger physical actions. In both simulation and real-world experiments, backdoored models achieve an average Attack Success Rate of 83% while maintaining over 93% Clean Performance Accuracy (CPA) and sub-second latency, demonstrating both reliability and stealth. We further implement an agentic verification defense using a secondary LLM for semantic consistency checking. Although this reduces the Attack Success Rate (ASR) to 20%, it increases end-to-end latency to 8-9 seconds, exposing a significant security-responsiveness trade-off in real-time robotic systems. These results highlight structural vulnerabilities in LLM-mediated robotic control architectures and underscore the need for robotics-aware defenses for embodied AI systems.

翻译：将大型语言模型（LLM）集成到机器人控制管道中，可实现将用户提示转化为可执行命令的自然语言接口。然而，这种数字-物理接口引入了一个关键且尚未充分探索的漏洞：在微调过程中嵌入的结构化后门攻击。本研究通过实验探究了基于LoRA的供应链后门在LLM介导的ROS2机器人控制系统中的影响，并评估其对物理机器人执行的实际作用。我们构建了两种针对命令生成管道不同阶段的投毒微调策略，并揭示了一个关键的系统级洞见：嵌入在自然语言推理阶段的后门无法可靠地传播至可执行控制输出，而直接与结构化JSON命令格式对齐的后门则能成功穿越转换并触发物理动作。在仿真与真实世界实验中，植入后门的模型在保持超过93%的清洁性能准确率（CPA）和亚秒级延迟的同时，平均攻击成功率（ASR）达到83%，兼具可靠性与隐蔽性。进一步地，我们实现了一种基于代理的验证防御机制，利用辅助LLM进行语义一致性检查。尽管该机制将攻击成功率降至20%，但将端到端延迟增加至8-9秒，暴露了实时机器人系统中安全性与响应性之间的显著权衡。这些结果揭示了LLM介导的机器人控制架构中的结构性脆弱性，并强调了具身人工智能系统需要面向机器人的专用防御手段。