Real-world reinforcement learning applications are often hindered by delayed feedback from environments, which violates the Markov assumption and introduces significant challenges. Although numerous delay-compensating methods have been proposed for environments with constant delays, environments with random delays remain largely unexplored due to their inherent variability and unpredictability. In this study, we propose a simple yet robust agent for decision-making under random delays, termed the conservative agent, which reformulates the random-delay environment into its constant-delay equivalent. This transformation enables any state-of-the-art constant-delay method to be directly extended to the random-delay environments without modifying the algorithmic structure or sacrificing performance. We evaluate the conservative agent-based algorithm on continuous control tasks, and empirical results demonstrate that it significantly outperforms existing baseline algorithms in terms of asymptotic performance and sample efficiency.
翻译:现实世界中的强化学习应用常受环境反馈延迟的阻碍,这种延迟违背了马尔可夫假设并带来显著挑战。尽管针对固定延迟环境已提出多种延迟补偿方法,但由于随机延迟固有的多变性和不可预测性,相关研究仍处于探索不足的状态。本研究提出一种适用于随机延迟决策的简洁而鲁棒的智能体,称为保守智能体,其将随机延迟环境重新表述为等效的固定延迟环境。该转换使得任何先进的固定延迟方法无需修改算法结构或牺牲性能,即可直接扩展至随机延迟环境。我们在连续控制任务上评估了基于保守智能体的算法,实证结果表明其在渐进性能和样本效率方面显著优于现有基线算法。