Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.
翻译:行为克隆,或者更广泛地说,从示范中学习(LfD)是复杂场景下机器人策略学习的一个有前途的方向。尽管行为克隆实现简单且数据高效,但它也存在自身的缺陷,限制了其在真实机器人设置中的有效性。在本工作中,我们通过利用隐式能量基策略模型,朝着改进从示范中学习算法迈出了一步。结果表明,在选定的复杂机器人策略学习场景中,使用隐式模型进行监督式策略学习通常比常用的基于神经网络的显式模型表现更好,尤其是在近似可能不连续和多模态函数的情况下。