Learning state representations has gained steady popularity in reinforcement learning (RL) due to its potential to improve both sample efficiency and returns on many environments. A straightforward and efficient method is to generate representations with a distinct neural network trained on an auxiliary task, i.e. a task that differs from the actual RL task. While a whole range of such auxiliary tasks has been proposed in the literature, a comparison on typical continuous control benchmark environments is computationally expensive and has, to the best of our knowledge, not been performed before. This paper presents such a comparison of common auxiliary tasks, based on hundreds of agents trained with state-of-the-art off-policy RL algorithms. We compare possible improvements in both sample efficiency and returns for environments ranging from simple pendulum to a complex simulated robotics task. Our findings show that representation learning with auxiliary tasks is beneficial for environments of higher dimension and complexity, and that learning environment dynamics is preferable to predicting rewards. We believe these insights will enable other researchers to make more informed decisions on how to utilize representation learning for their specific problem.
翻译:在强化学习中,学习状态表征因其在众多环境中提升样本效率与回报的潜力而日益受到关注。一种直接有效的方法是利用基于辅助任务(即与主强化学习任务不同的任务)训练的独立神经网络来生成表征。尽管文献中已提出一系列此类辅助任务,但由于在典型连续控制基准环境中的比较计算成本高昂,据我们所知,此前尚未有相关研究。本文基于采用最先进离策略强化学习算法训练的数百个智能体,对常见辅助任务进行了此类比较。我们评估了从简单摆锤到复杂仿真机器人任务等环境中样本效率与回报的潜在改进。研究结果表明,基于辅助任务的表征学习对高维、高复杂度环境具有优势,且学习环境动态优于预测奖励。我们相信,这些见解将帮助其他研究者更明智地决定如何针对自身问题利用表征学习。