We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selects the appropriate discrete skill and continuous parameters at each timestep. Using a combination of temporal variational inference and information-theoretic regularization methods, we address the challenge of degeneracy common in latent variable models, ensuring that the learned skills are temporally extended, semantically meaningful, and adaptable. We empirically show that learning parameterized skills from multitask expert demonstrations significantly improves generalization to unseen tasks. Our method outperforms multitask as well as skill learning baselines on both LIBERO and MetaWorld benchmarks. We also demonstrate that DEPS discovers interpretable parameterized skills, such as an object grasping skill whose continuous arguments define the grasp location.
翻译:我们提出了DEPS,一种从专家演示中学习参数化技能的端到端算法。我们的方法联合学习参数化技能策略和一个元策略,该元策略在每个时间步选择适当的离散技能和连续参数。通过结合时序变分推断和信息论正则化方法,我们解决了隐变量模型中常见的退化问题,确保学习到的技能具有时间扩展性、语义意义明确且可适应。我们通过实验表明,从多任务专家演示中学习参数化技能能显著提升对未见任务的泛化能力。我们的方法在LIBERO和MetaWorld基准测试中均优于多任务及技能学习基线。我们还证明DEPS能够发现可解释的参数化技能,例如一个物体抓取技能,其连续参数定义了抓取位置。