Large Language Models (LLMs) are increasingly used to control robotic systems such as drones, but their risks of causing physical threats and harm in real-world applications remain unexplored. Our study addresses the critical gap in evaluating LLM physical safety by developing a comprehensive benchmark for drone control. We classify the physical safety risks of drones into four categories: (1) human-targeted threats, (2) object-targeted threats, (3) infrastructure attacks, and (4) regulatory violations. Our evaluation of mainstream LLMs reveals an undesirable trade-off between utility and safety, with models that excel in code generation often performing poorly in crucial safety aspects. Furthermore, while incorporating advanced prompt engineering techniques such as In-Context Learning and Chain-of-Thought can improve safety, these methods still struggle to identify unintentional attacks. In addition, larger models demonstrate better safety capabilities, particularly in refusing dangerous commands. Our findings and benchmark can facilitate the design and evaluation of physical safety for LLMs. The project page is available at huggingface.co/spaces/TrustSafeAI/LLM-physical-safety.
翻译:大语言模型(LLMs)正日益广泛地应用于控制无人机等机器人系统,但其在现实应用中引发物理威胁与伤害的风险尚未得到充分探究。本研究通过构建一个面向无人机控制的综合性基准测试,填补了评估LLM物理安全性的关键空白。我们将无人机的物理安全风险划分为四类:(1) 针对人类的威胁,(2) 针对物体的威胁,(3) 基础设施攻击,以及(4) 违规操作。我们对主流LLMs的评估揭示了一个令人担忧的效用与安全性之间的权衡:在代码生成方面表现出色的模型,往往在关键安全维度上表现不佳。此外,尽管引入上下文学习、思维链等先进的提示工程技术能够提升安全性,但这些方法仍难以有效识别非蓄意攻击。同时,规模更大的模型展现出更优的安全能力,尤其在拒绝危险指令方面。我们的研究发现与基准测试框架,可为LLM物理安全性的设计与评估提供支持。项目页面发布于 huggingface.co/spaces/TrustSafeAI/LLM-physical-safety。