Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models sometimes called linear recurrent models. We discover that we can model the recurrent update of these models using a monoid, leading us to reformulate existing models using a novel memory monoid framework. We revisit the traditional approach to batching in recurrent RL, highlighting both theoretical and empirical deficiencies. We leverage the properties of memory monoids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in RL.
翻译:诸如循环神经网络(RNN)和Transformer等记忆模型通过将轨迹映射至潜在马尔可夫状态来处理部分可观测马尔可夫决策过程(POMDP)。相较于新兴的线性循环记忆模型类别,这两种模型在长序列处理上的可扩展性均不够理想。我们发现这些模型的循环更新过程可通过幺半群建模,由此提出一种新型记忆幺半群框架来重构现有模型。我们重新审视了循环强化学习中的传统批处理方法,揭示了其理论缺陷与实证不足。通过利用记忆幺半群的性质,我们提出了一种批处理方法,该方法能提升样本效率、增加累积回报,并简化强化学习中循环损失函数的实现。