Current reinforcement learning algorithms train an agent using forward-generated trajectories, which provide little guidance so that the agent can explore as much as possible. While realizing the value of reinforcement learning results from sufficient exploration, this approach leads to a trade-off in losing sample efficiency, an essential factor impacting algorithm performance. Previous tasks use reward-shaping techniques and network structure modification to increase sample efficiency. However, these methods require many steps to implement. In this work, we propose novel backward curriculum reinforcement learning that begins training the agent using the backward trajectory of the episode instead of the original forward trajectory. This approach provides the agent with a strong reward signal, enabling more sample-efficient learning. Moreover, our method only requires a minor change in the algorithm of reversing the order of the trajectory before agent training, allowing a straightforward application to any state-of-the-art algorithm.
翻译:当前的强化学习算法使用前向生成的轨迹来训练智能体,这类轨迹提供的引导较少,使得智能体能够尽可能充分探索。尽管强化学习的价值源于充分探索,但这种方法会导致样本效率(影响算法性能的关键因素)的损失,形成一种权衡。以往的任务通过奖励塑形技术和网络结构修改来提高样本效率,然而这些方法需要许多步骤来实现。在本工作中,我们提出了一种新颖的反向课程强化学习方法——该方法使用智能体关于回合的反向轨迹而非原始前向轨迹进行训练。这种方法为智能体提供了强大的奖励信号,使其能够进行更高效的样本学习。此外,我们的方法只需在训练前对算法进行轨迹顺序反转的微小改动,即可轻松应用于任何现代算法。