Behavioral cloning, or more broadly, learning from demonstrations (LfD) is a priomising direction for robot policy learning in complex scenarios. Albeit being straightforward to implement and data-efficient, behavioral cloning has its own drawbacks, limiting its efficacy in real robot setups. In this work, we take one step towards improving learning from demonstration algorithms by leveraging implicit energy-based policy models. Results suggest that in selected complex robot policy learning scenarios, treating supervised policy learning with an implicit model generally performs better, on average, than commonly used neural network-based explicit models, especially in the cases of approximating potentially discontinuous and multimodal functions.
翻译:行为克隆,或更广泛地说,基于示范的学习(LfD),是复杂场景下机器人策略学习的一个有前景的方向。尽管行为克隆易于实施且数据效率高,但它自身存在局限性,从而限制了其在真实机器人环境中的有效性。在本研究中,我们通过利用隐式能量基策略模型,向改进基于示范的学习算法迈出了一步。结果表明,在选定的复杂机器人策略学习场景中,采用隐式模型进行有监督策略学习通常比常用的基于神经网络的显式模型表现更优,尤其在逼近可能不连续且多峰的函数时。