Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.
翻译:行为克隆,或更广义的示范学习(LfD),是复杂场景中机器人策略学习的一个有前景的方向。尽管行为克隆易于实现且数据高效,但其自身存在缺陷,限制了其在真实机器人设置中的有效性。在本工作中,我们通过利用基于能量的隐式策略模型,朝着改进示范学习算法迈出了一步。结果表明,在选定的复杂机器人策略学习场景中,使用隐式模型进行监督策略学习通常平均表现优于常用的基于神经网络的显式模型,特别是在逼近潜在不连续和多模态函数的情况下。