Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been particularly successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful manipulation behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, surpassing the state-of-the-art approaches for skill discovery by a large margin.
翻译:自监督技能学习旨在获取能够利用环境内在动态的有用行为。基于互信息最大化的潜变量模型在这一任务中取得了显著成功,但在机器人操作场景中仍面临挑战。由于需要影响构成环境的可能大量自由度,互信息最大化单独无法生成有用的操作行为。为解决这一局限,我们提出SLIM——一种面向技能发现的多评价器学习方法,特别关注机器人操作领域。核心洞察在于:在演员-评论家框架中利用多个评价器优雅地组合多种奖励函数,能够显著提升机器人操作的潜变量技能发现效果,同时克服奖励之间可能产生的干扰——这种干扰会阻碍有用技能的收敛。此外,在桌面操作场景中,我们通过分层强化学习的方式展示了这种新型技能发现方法在获取安全高效的原始运动技能方面的适用性,并利用规划技术进行部署,以较大优势超越了现有的技能发现方法。