We present an algorithm for skill discovery from expert demonstrations. The algorithm first utilizes Large Language Models (LLMs) to propose an initial segmentation of the trajectories. Following that, a hierarchical variational inference framework incorporates the LLM-generated segmentation information to discover reusable skills by merging trajectory segments. To further control the trade-off between compression and reusability, we introduce a novel auxiliary objective based on the Minimum Description Length principle that helps guide this skill discovery process. Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning and outperform baseline skill learning approaches on new long-horizon tasks in BabyAI, a grid world navigation environment, as well as ALFRED, a household simulation environment.
翻译:我们提出一种从专家示范中学习技能的算法。该算法首先利用大型语言模型(LLMs)对轨迹进行初步分段,随后通过分层变分推理框架整合LLM生成的分段信息,通过合并轨迹片段发现可复用的技能。为进一步调控压缩性与可复用性之间的权衡,我们引入基于最小描述长度原则的新型辅助目标函数,用以引导这一技能发现过程。实验结果表明,采用本方法的智能体能够发现加速学习的技能,在BabyAI(网格世界导航环境)和ALFRED(家庭模拟环境)等新长周期任务中,其表现优于基线技能学习方法。