Mimicking the real interaction trajectory in the inference of the world model has been shown to improve the sample efficiency of model-based reinforcement learning (MBRL) algorithms. Many methods directly use known state sequences for reasoning. However, this approach fails to enhance the quality of reasoning by capturing the subtle variation between states. Much like how humans infer trends in event development from this variation, in this work, we introduce Global-Local variation Awareness Mamba-based world model (GLAM) that improves reasoning quality by perceiving and predicting variation between states. GLAM comprises two Mambabased parallel reasoning modules, GMamba and LMamba, which focus on perceiving variation from global and local perspectives, respectively, during the reasoning process. GMamba focuses on identifying patterns of variation between states in the input sequence and leverages these patterns to enhance the prediction of future state variation. LMamba emphasizes reasoning about unknown information, such as rewards, termination signals, and visual representations, by perceiving variation in adjacent states. By integrating the strengths of the two modules, GLAM accounts for highervalue variation in environmental changes, providing the agent with more efficient imagination-based training. We demonstrate that our method outperforms existing methods in normalized human scores on the Atari 100k benchmark.
翻译:在基于模型的强化学习(MBRL)算法中,模仿真实交互轨迹进行世界模型的推理已被证明可以提高样本效率。许多方法直接使用已知的状态序列进行推理。然而,这种方法未能通过捕捉状态间的细微变化来提升推理质量。正如人类通过这种变化推断事件发展趋势一样,在本工作中,我们引入了基于Mamba的全局-局部变化感知世界模型(GLAM),该模型通过感知和预测状态间的变化来提升推理质量。GLAM包含两个基于Mamba的并行推理模块:GMamba和LMamba,它们分别在推理过程中从全局和局部视角聚焦于感知变化。GMamba专注于识别输入序列中状态间的变化模式,并利用这些模式来增强对未来状态变化的预测。LMamba则强调通过感知相邻状态的变化,来推理未知信息,如奖励、终止信号和视觉表征。通过整合两个模块的优势,GLAM能够考虑环境变化中更高价值的变化,为智能体提供更高效的基于想象的训练。我们证明了我们的方法在Atari 100k基准测试的归一化人类得分上优于现有方法。