Reinforcement learning is challenging in delayed scenarios, a common real-world situation where observations and interactions occur with delays. State-of-the-art (SOTA) state-augmentation techniques either suffer from the state-space explosion along with the delayed steps, or performance degeneration in stochastic environments. To address these challenges, our novel Auxiliary-Delayed Reinforcement Learning (AD-RL) leverages an auxiliary short-delayed task to accelerate the learning on a long-delayed task without compromising the performance in stochastic environments. Specifically, AD-RL learns the value function in the short-delayed task and then employs it with the bootstrapping and policy improvement techniques in the long-delayed task. We theoretically show that this can greatly reduce the sample complexity compared to directly learning on the original long-delayed task. On deterministic and stochastic benchmarks, our method remarkably outperforms the SOTAs in both sample efficiency and policy performance.
翻译:在延迟场景中,强化学习面临挑战——这是一种常见的现实情况,其中观测和交互存在延迟。最先进的状态增强技术要么在延迟步骤增加时遭受状态空间爆炸,要么在随机环境中性能退化。为应对这些挑战,我们提出的新颖的辅助延迟强化学习(AD-RL)方法利用辅助短延迟任务加速长延迟任务的学习,同时不牺牲随机环境中的性能。具体而言,AD-RL学习短延迟任务中的价值函数,然后将其与长延迟任务中的自助法和策略改进技术结合使用。我们从理论上证明,与直接学习原始长延迟任务相比,这可以显著降低样本复杂度。在确定性和随机基准测试中,我们的方法在样本效率和策略性能上均显著优于当前最先进方法。