Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers and performs better than LSTM models on a simple memory-based task. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper suggest that the S4 models are a strong contender for the default architecture used for in-context reinforcement learning
翻译:结构化状态空间序列(S4)模型近期在长序列建模任务中取得了最先进的性能。这些模型具有快速的推理速度和可并行化的训练特性,使其在多种强化学习场景中具有潜在应用价值。我们提出了一种针对S4变体的改进方法,能够并行地初始化和重置隐藏状态,从而适用于强化学习任务。研究表明,我们的改进架构在渐近速度上快于Transformer,并在基于记忆的简单任务上表现优于LSTM模型。进一步,利用模型处理长序列的能力,我们在具有挑战性的元学习任务中取得了优异表现——该任务要求智能体在随机采样的连续控制环境中,结合环境观测与动作的随机线性投影进行学习。此外,我们发现该模型能够适应分布外(out-of-distribution)的保留任务。总体而言,本文结果表明S4模型已成为上下文强化学习默认架构的有力竞争者。