Classic reinforcement learning (RL) frequently confronts challenges in tasks involving delays, which cause a mismatch between received observations and subsequent actions, thereby deviating from the Markov assumption. Existing methods usually tackle this issue with end-to-end solutions using state augmentation. However, these black-box approaches often involve incomprehensible processes and redundant information in the information states, causing instability and potentially undermining the overall performance. To alleviate the delay challenges in RL, we propose $\textbf{DEER (Delay-resilient Encoder-Enhanced RL)}$, a framework designed to effectively enhance the interpretability and address the random delay issues. DEER employs a pretrained encoder to map delayed states, along with their variable-length past action sequences resulting from different delays, into hidden states, which is trained on delay-free environment datasets. In a variety of delayed scenarios, the trained encoder can seamlessly integrate with standard RL algorithms without requiring additional modifications and enhance the delay-solving capability by simply adapting the input dimension of the original algorithms. We evaluate DEER through extensive experiments on Gym and Mujoco environments. The results confirm that DEER is superior to state-of-the-art RL algorithms in both constant and random delay settings.
翻译:摘要:经典强化学习在涉及延迟的任务中常面临挑战,延迟会导致接收观测与后续动作之间的错配,从而偏离马尔可夫假设。现有方法通常采用基于状态增强的端到端解决方案来应对这一问题,然而这些黑盒方法在信息状态处理中往往包含难以理解的过程与冗余信息,导致不稳定并可能削弱整体性能。为缓解强化学习中的延迟挑战,我们提出$\textbf{DEER(抗延迟编码器增强强化学习)}$框架,旨在有效提升可解释性并解决随机延迟问题。DEER采用预训练编码器将延迟状态及因不同延迟产生的变长历史动作序列映射为隐状态,该编码器基于无延迟环境数据集进行训练。在多种延迟场景下,该训练编码器可与标准强化学习算法无缝集成而无需额外修改,仅通过调整原始算法的输入维度即可增强延迟求解能力。我们在Gym和Mujoco环境中通过大量实验评估DEER,结果表明,在恒定延迟与随机延迟设定下,DEER均优于最先进的强化学习算法。