Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers and performs better than LSTM models on a simple memory-based task. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper suggest that the S4 models are a strong contender for the default architecture used for in-context reinforcement learning
翻译:结构化状态空间序列(S4)模型近期在长程序列建模任务中取得了最先进的性能。这些模型同时具备快速推理速度和可并行训练能力,使其在多种强化学习场景中具有潜在应用价值。我们提出对S4模型的一种变体进行改进,使其能够并行初始化与重置隐藏状态,从而处理强化学习任务。实验表明,改进后的架构在渐进速度上优于Transformer,并在简单记忆任务中表现超过LSTM模型。进一步地,通过利用模型处理长程序列的能力,我们在具有挑战性的元学习任务中取得了优异性能——在该任务中,智能体需处理随机采样的连续控制环境,并结合环境观测与动作的随机线性投影。此外,我们证明该模型能够适应分布外的保留任务。总体而言,本文结果表明S4模型是上下文强化学习默认架构的有力竞争者。