Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness.
翻译:图像强化学习是一项实用但具挑战性的任务。其主要难点在于提取以控制为中心的表征,同时忽略无关信息。虽然遵循双模拟原理的方法在通过学习状态表征解决此问题上展现出潜力,但它们仍受限于潜在动力学的有限表达能力以及对稀疏奖励环境的不适应性。为解决这些局限,我们提出ReBis,旨在通过整合无奖励控制信息与奖励特定知识来捕捉控制中心信息。ReBis利用Transformer架构隐式建模动力学,并通过逐块掩码消除时空冗余。此外,ReBis将基于双模拟的损失与非对称重构损失相结合,以防止在稀疏奖励环境中出现特征坍缩。在包括Atari游戏和DeepMind控制套件两个大型基准上的实证研究表明,ReBis相较于现有方法具有优越性能,证明其有效性。