Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
翻译:模仿学习通过示范进行学习,已在未预设奖励函数的序列决策任务中得到研究和进展。然而,模仿学习方法仍需要大量专家示范样本才能成功模仿专家行为。为提高样本效率,我们利用自监督表示学习——该方法能从给定数据生成海量训练信号。本研究提出一种基于自监督表示的对抗模仿学习方法,用于在非图像控制任务中学习对多种失真具有鲁棒性且具有时间预测性的状态和动作表示。特别地,与现有针对表格数据的自监督学习方法相比,我们提出了不同的状态和动作表示破坏方法,使其对多种失真具有鲁棒性。我们通过理论和实验观察到,以更低的样本复杂度构建信息丰富的特征流形能显著提升模仿学习性能。在仅限100个专家状态-动作对的MuJoCo设置下,所提方法相较现有对抗模仿学习方法实现39%的相对改进。此外,我们通过使用不同最优性程度的示范进行综合消融实验和附加实验,为多种影响因素提供了深入见解。