Reinforcement learning (RL) provides a powerful framework for decision-making, but its application in practice often requires a carefully designed reward function. Adversarial Imitation Learning (AIL) sheds light on automatic policy acquisition without access to the reward signal from the environment. In this work, we propose Auto-Encoding Adversarial Imitation Learning (AEAIL), a robust and scalable AIL framework. To induce expert policies from demonstrations, AEAIL utilizes the reconstruction error of an auto-encoder as a reward signal, which provides more information for optimizing policies than the prior discriminator-based ones. Subsequently, we use the derived objective functions to train the auto-encoder and the agent policy. Experiments show that our AEAIL performs superior compared to state-of-the-art methods on both state and image based environments. More importantly, AEAIL shows much better robustness when the expert demonstrations are noisy.
翻译:强化学习(RL)为决策提供了强大的框架,但其在实际应用中通常需要精心设计的奖励函数。对抗模仿学习(AIL)无需环境提供的奖励信号即可实现策略的自动获取。本文提出自编码对抗模仿学习(AEAIL),一种鲁棒且可扩展的AIL框架。为从演示数据中诱导专家策略,AEAIL利用自编码器的重构误差作为奖励信号,相较于基于判别器的先前方法,该信号为策略优化提供了更丰富的信息。随后,我们利用推导的目标函数训练自编码器和智能体策略。实验表明,在基于状态和图像的两种环境中,我们的AEAIL方法均优于当前最优方法。更重要的是,当专家演示数据存在噪声时,AEAIL展现出更强的鲁棒性。