Visual reinforcement learning has proven effective in solving control tasks with high-dimensional observations. However, extracting reliable and generalizable representations from vision-based observations remains a central challenge. Inspired by the human thought process, when the representation extracted from the observation can predict the future and trace history, the representation is reliable and accurate in comprehending the environment. Based on this concept, we introduce a Bidirectional Transition (BiT) model, which leverages the ability to bidirectionally predict environmental transitions both forward and backward to extract reliable representations. Our model demonstrates competitive generalization performance and sample efficiency on two settings of the DeepMind Control suite. Additionally, we utilize robotic manipulation and CARLA simulators to demonstrate the wide applicability of our method.
翻译:视觉强化学习在处理高维观测控制任务中已展现出显著效果。然而,从视觉观测中提取可靠且可泛化的表示仍是核心挑战。受人类思维过程启发,当从观测中提取的表示既能预测未来又能追溯历史时,这种表示才能可靠且准确地理解环境。基于这一理念,我们提出双向转换(BiT)模型,通过前向和后向双向预测环境转换的能力来提取可靠表示。该模型在DeepMind控制套件的两种设置中展现出具有竞争力的泛化性能与样本效率。此外,我们通过机器人操控和CARLA模拟器验证了本方法的广泛适用性。