Safety-critical task planning in robotic systems remains challenging: classical planners suffer from poor scalability, Reinforcement Learning (RL)-based methods generalize poorly, and base Large Language Models (LLMs) cannot guarantee safety. To address this gap, we propose safety-generalizable large language models, named SafeGen-LLM. SafeGen-LLM can not only enhance the safety satisfaction of task plans but also generalize well to novel safety properties in various domains. We first construct a multi-domain Planning Domain Definition Language 3 (PDDL3) benchmark with explicit safety constraints. Then, we introduce a two-stage post-training framework: Supervised Fine-Tuning (SFT) on a constraint-compliant planning dataset to learn planning syntax and semantics, and Group Relative Policy Optimization (GRPO) guided by fine-grained reward machines derived from formal verification to enforce safety alignment and by curriculum learning to better handle complex tasks. Extensive experiments show that SafeGen-LLM achieves strong safety generalization and outperforms frontier proprietary baselines across multi-domain planning tasks and multiple input formats (e.g., PDDLs and natural language).
翻译:机器人系统中的安全关键任务规划仍然面临挑战:经典规划器可扩展性差,基于强化学习(RL)的方法泛化能力不足,而基础大语言模型(LLMs)无法保证安全性。为弥补这一差距,我们提出了安全性可泛化的大语言模型,命名为SafeGen-LLM。SafeGen-LLM不仅能提升任务计划的安全性满足度,还能在各种领域中良好地泛化到新的安全属性。我们首先构建了一个包含显式安全约束的多领域规划领域定义语言3(PDDL3)基准。接着,我们引入了一个两阶段的后训练框架:在符合约束的规划数据集上进行监督微调(SFT),以学习规划语法和语义;然后进行组相对策略优化(GRPO),该优化由形式化验证导出的细粒度奖励机制指导以强化安全对齐,并通过课程学习来更好地处理复杂任务。大量实验表明,SafeGen-LLM在跨多领域规划任务和多种输入格式(例如,PDDL和自然语言)上实现了强大的安全性泛化能力,并超越了前沿的专有基线模型。