We consider the problem of unsupervised skill segmentation and hierarchical structure discovery in reinforcement learning. While recent approaches have sought to segment trajectories into reusable skills or options, most rely on action labels, rewards, or handcrafted annotations, limiting their applicability. We propose a method that segments unlabelled trajectories into skills and induces a hierarchical structure over them using a grammar-based approach. The resulting hierarchy captures both low-level behaviours and their composition into higher-level skills. We evaluate our approach in high-dimensional, pixel-based environments, including Craftax and the full, unmodified version of Minecraft. Using metrics for skill segmentation, reuse, and hierarchy quality, we find that our method consistently produces more structured and semantically meaningful hierarchies than existing baselines. Furthermore, as a proof of concept for utility, we demonstrate that these discovered hierarchies accelerate and stabilise learning on downstream reinforcement learning tasks.
翻译:本文研究强化学习中无监督技能分割与分层结构发现的问题。现有方法大多将轨迹分割为可复用的技能或选项,但通常依赖于动作标签、奖励信号或人工标注,限制了其适用性。我们提出一种基于语法的方法,能够将未标注的轨迹分割为技能,并诱导出分层结构。所得层次结构既能捕捉底层行为,又能表征其如何组合成高层技能。我们在高维像素环境中评估了该方法,包括Craftax和未经修改的完整版Minecraft。通过技能分割、复用和层次质量等指标,我们发现本方法始终能比现有基线生成更具结构性和语义意义的层次结构。此外,作为实用性的概念验证,我们证明这些发现的层次结构能够加速并稳定下游强化学习任务的学习过程。