In reinforcement learning with image-based inputs, it is crucial to establish a robust and generalizable state representation. Recent advancements in metric learning, such as deep bisimulation metric approaches, have shown promising results in learning structured low-dimensional representation space from pixel observations, where the distance between states is measured based on task-relevant features. However, these approaches face challenges in demanding generalization tasks and scenarios with non-informative rewards. This is because they fail to capture sufficient long-term information in the learned representations. To address these challenges, we propose a novel State Chrono Representation (SCR) approach. SCR augments state metric-based representations by incorporating extensive temporal information into the update step of bisimulation metric learning. It learns state distances within a temporal framework that considers both future dynamics and cumulative rewards over current and long-term future states. Our learning strategy effectively incorporates future behavioral information into the representation space without introducing a significant number of additional parameters for modeling dynamics. Extensive experiments conducted in DeepMind Control and Meta-World environments demonstrate that SCR achieves better performance comparing to other recent metric-based methods in demanding generalization tasks. The codes of SCR are available in https://github.com/jianda-chen/SCR.
翻译:在基于图像输入的强化学习中,建立稳健且可泛化的状态表征至关重要。度量学习的最新进展,例如深度双仿真度量方法,已在从像素观测中学习结构化低维表征空间方面展现出良好效果,其中状态间的距离基于任务相关特征进行度量。然而,这些方法在面对高要求泛化任务及奖励信息稀疏的场景时仍面临挑战,主要原因在于其未能充分捕捉学习表征中的长期时序信息。为解决这些问题,本文提出一种新颖的状态时序表征方法。该方法通过将广泛的时序信息融入双仿真度量学习的更新步骤,增强了基于状态度量的表征能力。SCR在时序框架下学习状态距离,该框架同时考虑未来动态及当前与长期未来状态的累积奖励。我们的学习策略能够有效将未来行为信息整合到表征空间中,且无需引入大量额外参数来建模动态。在DeepMind Control与Meta-World环境中进行的大量实验表明,在高要求泛化任务中,SCR相较于其他近期基于度量的方法取得了更优的性能。SCR的代码已公开于https://github.com/jianda-chen/SCR。