Learning diverse and high-performance behaviors from a limited set of demonstrations is a grand challenge. Traditional imitation learning methods usually fail in this task because most of them are designed to learn one specific behavior even with multiple demonstrations. Therefore, novel techniques for quality diversity imitation learning are needed to solve the above challenge. This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL), which 1) improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE), and 2) mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus. Empirically, our method significantly outperforms state-of-the-art IL methods, achieving near-expert or beyond-expert QD performance on the challenging continuous control tasks derived from MuJoCo environments.
翻译:从有限的演示中学习多样化且高性能的行为是一项重大挑战。传统的模仿学习方法通常无法完成此任务,因为大多数方法即使面对多个演示,也仅被设计用于学习单一特定行为。因此,需要新颖的质量多样性模仿学习技术来解决上述挑战。本研究提出了Wasserstein质量多样性模仿学习,该方法:1)通过基于Wasserstein自编码器的潜在对抗训练,提升了质量多样性设置下模仿学习的稳定性;2)利用带有单步档案探索奖励的度量条件奖励函数,缓解了行为过拟合问题。实验表明,我们的方法显著优于当前最先进的模仿学习方法,在源自MuJoCo环境的具有挑战性的连续控制任务上,实现了接近甚至超越专家水平的QD性能。