With the integration of large language models (LLMs), embodied agents have strong capabilities to execute complicated instructions in natural language, paving a way for the potential deployment of embodied robots. However, a foreseeable issue is that those embodied agents can also flawlessly execute some hazardous tasks, potentially causing damages in real world. To study this issue, we present SafeAgentBench -- a new benchmark for safety-aware task planning of embodied LLM agents. SafeAgentBench includes: (1) a new dataset with 750 tasks, covering 10 potential hazards and 3 task types; (2) SafeAgentEnv, a universal embodied environment with a low-level controller, supporting multi-agent execution with 17 high-level actions for 8 state-of-the-art baselines; and (3) reliable evaluation methods from both execution and semantic perspectives. Experimental results show that the best-performing baseline gets 69% success rate for safe tasks, but only 5% rejection rate for hazardous tasks, indicating significant safety risks. More details and codes are available at https://github.com/shengyin1224/SafeAgentBench.
翻译:随着大语言模型(LLMs)的集成,具身化智能体具备了执行复杂自然语言指令的强大能力,为具身机器人的潜在部署铺平了道路。然而,一个可预见的问题是,这些具身化智能体同样可能完美地执行某些危险任务,从而在现实世界中造成损害。为了研究这一问题,我们提出了SafeAgentBench——一个用于具身化LLM智能体安全感知任务规划的新基准。SafeAgentBench包括:(1)一个包含750个任务的新数据集,涵盖10种潜在危险和3种任务类型;(2)SafeAgentEnv,一个配备底层控制器的通用具身化环境,支持多智能体执行,并为8个最先进的基线模型提供了17种高层动作;(3)从执行和语义两个角度出发的可靠评估方法。实验结果表明,性能最佳的基线模型在安全任务上取得了69%的成功率,但在危险任务上仅有5%的拒绝率,这表明存在显著的安全风险。更多细节和代码可在 https://github.com/shengyin1224/SafeAgentBench 获取。