Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness.
翻译:基于图像的强化学习是一项实用但具有挑战性的任务。其主要困难在于提取控制中心表示的同时忽略无关信息。尽管遵循双模拟原理的方法在学习状态表示以解决该问题上展现出潜力,但仍面临潜在动力学表达能力有限以及难以适应稀疏奖励环境等问题。为解决这些局限,我们提出ReBis,旨在通过整合无奖励控制信息和特定奖励知识来捕获控制中心信息。ReBis采用Transformer架构隐式建模动力学,并通过逐块掩码消除时空冗余。此外,ReBis将基于双模拟的损失与不对称重建损失相结合,以防止稀疏奖励环境中的特征坍塌。在Atari游戏和DeepMind控制套件两个大型基准上的实证研究表明,ReBis相比现有方法具有更优性能,验证了其有效性。