Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
翻译:模仿学习通过示范进行学习,已在奖励函数未预定义的序列决策任务中得到研究与推进。然而,模仿学习方法仍需要大量专家示范样本才能成功模仿专家行为。为提升样本效率,我们利用自监督表示学习,该技术可从给定数据中生成丰富的训练信号。本研究提出一种基于自监督表示的对抗模仿学习方法,用于学习对多样化扰动具有鲁棒性且具备时间预测能力的状态与动作表示(适用于非图像控制任务)。特别地,相较于现有适用于表格数据的自监督学习方法,我们针对状态与动作表示提出了一种不同的噪声处理策略,使其能应对多种扰动。我们从理论与实证两方面观察到:以更低的样本复杂度构建信息丰富的特征流形,能显著提升模仿学习性能。在MuJoCo环境中仅限100个专家状态-动作对的设置下,所提方法相比现有对抗模仿学习方法实现了39%的相对性能提升。此外,我们通过使用不同最优性的示范进行了全面的消融实验与附加实验,为影响因素的解析提供了深刻见解。