Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful and safe manipulation behaviors. Furthermore, tackling this by augmenting skill discovery rewards with additional rewards through a naive combination might fail to produce desired behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, significantly surpassing baseline approaches for skill discovery.
翻译:自监督技能学习旨在获取能够利用环境内在动态的有用行为。基于互信息最大化的潜变量模型在此任务中取得了成功,但在机器人操作背景下仍面临挑战。由于需要影响构成环境的可能大量自由度,互信息最大化单独无法产生有用且安全的操作行为。此外,通过简单组合技能发现奖励与额外奖励来应对这一问题,可能无法产生期望的行为。为解决这一局限,我们提出了SLIM——一种专注于机器人操作的多评论员技能学习方法。我们的主要见解是,在演员-评论员框架中利用多个评论员优雅地组合多个奖励函数,能显著提升机器人操作的潜变量技能发现能力,同时克服奖励间可能出现的干扰——这种干扰会阻碍技能收敛至有用形式。此外,在桌面操作场景中,我们展示了这种新型技能发现方法的适用性:通过分层强化学习方式获取安全高效的运动基元,并利用规划对其进行调度,在技能发现方面显著超越基线方法。