In this paper, we propose a novel approach called DIffusion-guided DIversity (DIDI) for offline behavioral generation. The goal of DIDI is to learn a diverse set of skills from a mixture of label-free offline data. We achieve this by leveraging diffusion probabilistic models as priors to guide the learning process and regularize the policy. By optimizing a joint objective that incorporates diversity and diffusion-guided regularization, we encourage the emergence of diverse behaviors while maintaining the similarity to the offline data. Experimental results in four decision-making domains (Push, Kitchen, Humanoid, and D4RL tasks) show that DIDI is effective in discovering diverse and discriminative skills. We also introduce skill stitching and skill interpolation, which highlight the generalist nature of the learned skill space. Further, by incorporating an extrinsic reward function, DIDI enables reward-guided behavior generation, facilitating the learning of diverse and optimal behaviors from sub-optimal data.
翻译:本文提出了一种名为DIffusion-guided DIversity(DIDI)的新方法,用于离线行为生成。DIDI的目标是从无标签的混合离线数据中学习一组多样化的技能。我们通过利用扩散概率模型作为先验来指导学习过程并正则化策略,从而实现这一目标。通过优化一个结合了多样性和扩散引导正则化的联合目标,我们鼓励生成多样化的行为,同时保持与离线数据的相似性。在四个决策领域(Push、Kitchen、Humanoid和D4RL任务)的实验结果表明,DIDI能够有效地发现多样且具有区分性的技能。我们还引入了技能拼接和技能插值技术,这突显了所学技能空间的通用性。此外,通过引入外部奖励函数,DIDI实现了奖励引导的行为生成,促进了从次优数据中学习多样且最优的行为。