Skill-based reinforcement learning (RL) approaches have shown considerable promise, especially in solving long-horizon tasks via hierarchical structures. These skills, learned task-agnostically from offline datasets, can accelerate the policy learning process for new tasks. Yet, the application of these skills in different domains remains restricted due to their inherent dependency on the datasets, which poses a challenge when attempting to learn a skill-based policy via RL for a target domain different from the datasets' domains. In this paper, we present a novel offline skill learning framework DuSkill which employs a guided Diffusion model to generate versatile skills extended from the limited skills in datasets, thereby enhancing the robustness of policy learning for tasks in different domains. Specifically, we devise a guided diffusion-based skill decoder in conjunction with the hierarchical encoding to disentangle the skill embedding space into two distinct representations, one for encapsulating domain-invariant behaviors and the other for delineating the factors that induce domain variations in the behaviors. Our DuSkill framework enhances the diversity of skills learned offline, thus enabling to accelerate the learning procedure of high-level policies for different domains. Through experiments, we show that DuSkill outperforms other skill-based imitation learning and RL algorithms for several long-horizon tasks, demonstrating its benefits in few-shot imitation and online RL.
翻译:基于技能的强化学习方法已展现出显著潜力,尤其在通过分层结构解决长时程任务方面。这些技能从离线数据集中以任务无关的方式习得,能够加速新任务的策略学习过程。然而,这些技能在不同领域的应用仍受限于其对数据集的固有依赖,当尝试通过强化学习为不同于数据集领域的目标领域学习基于技能的策略时,这一依赖便构成了挑战。本文提出一种新颖的离线技能学习框架DuSkill,该框架采用引导式扩散模型来生成从数据集中有限技能扩展而来的多样化技能,从而增强不同领域任务策略学习的鲁棒性。具体而言,我们设计了一种基于引导扩散的技能解码器,结合分层编码,将技能嵌入空间解耦为两种不同的表示:一种用于封装领域不变的行为,另一种用于刻画导致行为中领域差异的因素。我们的DuSkill框架增强了离线学习技能的多样性,从而能够加速针对不同领域的高层策略的学习过程。通过实验,我们证明DuSkill在多项长时程任务上优于其他基于技能的模仿学习与强化学习算法,展示了其在少样本模仿和在线强化学习中的优势。