掩码生成先验提升世界模型的序列建模能力 (Masked Generative Priors Improve World Models Sequence Modelling Capabilities)

Deep Reinforcement Learning (RL) has become the leading approach for creating artificial agents in complex environments. Model-based approaches, which are RL methods with world models that predict environment dynamics, are among the most promising directions for improving data efficiency, forming a critical step toward bridging the gap between research and real-world deployment. In particular, world models enhance sample efficiency by learning in imagination, which involves training a generative sequence model of the environment in a self-supervised manner. Recently, Masked Generative Modelling has emerged as a more efficient and superior inductive bias for modelling and generating token sequences. Building on the Efficient Stochastic Transformer-based World Models (STORM) architecture, we replace the traditional MLP prior with a Masked Generative Prior (e.g., MaskGIT Prior) and introduce GIT-STORM. We evaluate our model on two downstream tasks: reinforcement learning and video prediction. GIT-STORM demonstrates substantial performance gains in RL tasks on the Atari 100k benchmark. Moreover, we apply Transformer-based World Models to continuous action environments for the first time, addressing a significant gap in prior research. To achieve this, we employ a state mixer function that integrates latent state representations with actions, enabling our model to handle continuous control tasks. We validate this approach through qualitative and quantitative analyses on the DeepMind Control Suite, showcasing the effectiveness of Transformer-based World Models in this new domain. Our results highlight the versatility and efficacy of the MaskGIT dynamics prior, paving the way for more accurate world models and effective RL policies.

翻译：深度强化学习已成为在复杂环境中创建智能体的主流方法。基于模型的方法——即配备能够预测环境动态的世界模型的强化学习方法——是提升数据效率最具前景的方向之一，构成了缩小研究与应用之间差距的关键步骤。具体而言，世界模型通过在想象中学习来提升样本效率，这涉及以自监督方式训练环境的生成式序列模型。近年来，掩码生成建模已成为建模和生成标记序列更高效且更具优势的归纳偏置。基于高效的随机Transformer世界模型架构，我们将传统的多层感知机先验替换为掩码生成先验，并提出了GIT-STORM模型。我们在两项下游任务上评估模型性能：强化学习与视频预测。在Atari 100k基准测试中，GIT-STORM在强化学习任务上展现出显著的性能提升。此外，我们首次将基于Transformer的世界模型应用于连续动作环境，填补了先前研究的重要空白。为实现这一目标，我们采用状态混合函数将潜在状态表征与动作信息相融合，使模型能够处理连续控制任务。通过在DeepMind控制套件上进行定性与定量分析，我们验证了该方法在新领域中的有效性。研究结果凸显了MaskGIT动态先验的通用性与高效性，为构建更精确的世界模型和更有效的强化学习策略开辟了新路径。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日