Structured State Space Models for In-Context Reinforcement Learning

Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. These models also have fast inference speeds and parallelisable training, making them potentially useful in many reinforcement learning settings. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks. We show that our modified architecture runs asymptotically faster than Transformers and performs better than LSTM models on a simple memory-based task. Then, by leveraging the model's ability to handle long-range sequences, we achieve strong performance on a challenging meta-learning task in which the agent is given a randomly-sampled continuous control environment, combined with a randomly-sampled linear projection of the environment's observations and actions. Furthermore, we show the resulting model can adapt to out-of-distribution held-out tasks. Overall, the results presented in this paper suggest that the S4 models are a strong contender for the default architecture used for in-context reinforcement learning

翻译：结构化状态空间序列（S4）模型近期在长序列建模任务中取得了最先进的性能。这些模型具有快速的推理速度和可并行化的训练特性，使其在多种强化学习场景中具有潜在应用价值。我们提出了一种针对S4变体的改进方法，能够并行地初始化和重置隐藏状态，从而适用于强化学习任务。研究表明，我们的改进架构在渐近速度上快于Transformer，并在基于记忆的简单任务上表现优于LSTM模型。进一步，利用模型处理长序列的能力，我们在具有挑战性的元学习任务中取得了优异表现——该任务要求智能体在随机采样的连续控制环境中，结合环境观测与动作的随机线性投影进行学习。此外，我们发现该模型能够适应分布外（out-of-distribution）的保留任务。总体而言，本文结果表明S4模型已成为上下文强化学习默认架构的有力竞争者。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/