Recent advances in robot learning increasingly rely on LLM-based task planning, leveraging their ability to bridge natural language with executable actions. While prior works showcased great performances, the widespread adoption of these models in robotics has been challenging as 1) existing methods are often closed-source or computationally intensive, neglecting the actual deployment on real-world physical systems, and 2) there is no universally accepted, plug-and-play representation for robotic task generation. Addressing these challenges, we propose BTGenBot-2, a 1B-parameter open-source small language model that directly converts natural language task descriptions and a list of robot action primitives into executable behavior trees in XML. Unlike prior approaches, BTGenBot-2 enables zero-shot BT generation, error recovery at inference and runtime, while remaining lightweight enough for resource-constrained robots. We further introduce the first standardized benchmark for LLM-based BT generation, covering 52 navigation and manipulation tasks in NVIDIA Isaac Sim. Extensive evaluations demonstrate that BTGenBot-2 consistently outperforms GPT-5, Claude Opus 4.1, and larger open-source models across both functional and non-functional metrics, achieving average success rates of 90.38% in zero-shot and 98.07% in one-shot, while delivering up to 16x faster inference compared to the previous BTGenBot.
翻译:机器人学习领域的最新进展日益依赖基于大语言模型的任务规划,利用其连接自然语言与可执行动作的能力。尽管先前的研究展示了卓越的性能,但这些模型在机器人学中的广泛应用仍面临挑战,原因在于:1)现有方法通常闭源或计算密集,忽视了在实际物理系统中的真实部署;2)缺乏一种普遍接受、即插即用的机器人任务生成表示方法。为应对这些挑战,我们提出了BTGenBot-2,这是一个拥有10亿参数的开源小型语言模型,能够直接将自然语言任务描述和机器人动作基元列表转换为XML格式的可执行行为树。与先前方法不同,BTGenBot-2支持零样本行为树生成、推理与运行时的错误恢复,同时保持足够轻量以适用于资源受限的机器人。我们进一步引入了首个基于大语言模型的行为树生成标准化基准,涵盖NVIDIA Isaac Sim中的52个导航与操作任务。大量评估表明,BTGenBot-2在功能性与非功能性指标上均持续优于GPT-5、Claude Opus 4.1及更大的开源模型,在零样本和单样本设置中分别实现了90.38%和98.07%的平均成功率,同时推理速度较之前的BTGenBot提升高达16倍。