Curriculum learning is a training mechanism in reinforcement learning (RL) that facilitates the achievement of complex policies by progressively increasing the task difficulty during training. However, designing effective curricula for a specific task often requires extensive domain knowledge and human intervention, which limits its applicability across various domains. Our core idea is that large language models (LLMs), with their extensive training on diverse language data and ability to encapsulate world knowledge, present significant potential for efficiently breaking down tasks and decomposing skills across various robotics environments. Additionally, the demonstrated success of LLMs in translating natural language into executable code for RL agents strengthens their role in generating task curricula. In this work, we propose CurricuLLM, which leverages the high-level planning and programming capabilities of LLMs for curriculum design, thereby enhancing the efficient learning of complex target tasks. CurricuLLM consists of: (Step 1) Generating sequence of subtasks that aid target task learning in natural language form, (Step 2) Translating natural language description of subtasks in executable task code, including the reward code and goal distribution code, and (Step 3) Evaluating trained policies based on trajectory rollout and subtask description. We evaluate CurricuLLM in various robotics simulation environments, ranging from manipulation, navigation, and locomotion, to show that CurricuLLM can aid learning complex robot control tasks. In addition, we validate humanoid locomotion policy learned through CurricuLLM in real-world. The code is provided in https://github.com/labicon/CurricuLLM
翻译:课程学习是强化学习(RL)中的一种训练机制,它通过在训练过程中逐步增加任务难度,来促进复杂策略的实现。然而,为特定任务设计有效的课程通常需要大量的领域知识和人工干预,这限制了其在各领域的适用性。我们的核心观点是,大型语言模型(LLMs)凭借其在多样化语言数据上的广泛训练以及封装世界知识的能力,在高效分解任务和解构各种机器人环境中的技能方面展现出巨大潜力。此外,LLMs 在将自然语言翻译为 RL 智能体可执行代码方面已取得的成功,进一步强化了其在生成任务课程中的作用。在本工作中,我们提出了 CurricuLLM,它利用 LLMs 的高层规划和编程能力进行课程设计,从而提升复杂目标任务的高效学习。CurricuLLM 包含以下步骤:(步骤 1)生成有助于目标任务学习的自然语言形式的子任务序列,(步骤 2)将子任务的自然语言描述翻译为可执行的任务代码,包括奖励代码和目标分布代码,以及(步骤 3)基于轨迹推演和子任务描述评估训练后的策略。我们在多种机器人仿真环境(涵盖操作、导航和移动)中评估 CurricuLLM,结果表明 CurricuLLM 能够辅助学习复杂的机器人控制任务。此外,我们在真实世界中验证了通过 CurricuLLM 学习到的人形机器人移动策略。代码发布于 https://github.com/labicon/CurricuLLM。