Recent work has shown that representation learning plays a critical role in sample-efficient reinforcement learning (RL) from pixels. Unfortunately, in real-world scenarios, representation learning is usually fragile to task-irrelevant distractions such as variations in background or viewpoint. To tackle this problem, we propose a novel clustering-based approach, namely Clustering with Bisimulation Metrics (CBM), which learns robust representations by grouping visual observations in the latent space. Specifically, CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments. Computing cluster assignments with bisimulation metrics enables CBM to capture task-relevant information, as bisimulation metrics quantify the behavioral similarity between observations. Moreover, CBM encourages the consistency of representations within each group, which facilitates filtering out task-irrelevant information and thus induces robust representations against distractions. An appealing feature is that CBM can achieve sample-efficient representation learning even if multiple distractions exist simultaneously.Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms and achieves state-of-the-art performance on both multiple and single distraction settings. The code is available at https://github.com/MIRALab-USTC/RL-CBM.
翻译:近期研究表明,表示学习在高样本效率的像素级强化学习中扮演关键角色。然而在现实场景中,表示学习通常对背景变化或视角变化等任务无关干扰非常脆弱。为解决该问题,我们提出名为双模拟度量聚类(CBM)的新型聚类方法,通过将视觉观测在潜在空间分组来学习鲁棒表示。具体而言,CBM交替执行两个步骤:(1)通过计算观测到学习原型的双模拟距离进行分组;(2)根据当前聚类分配学习原型集。由于双模拟距离可量化观测间的行为相似性,采用双模拟度量计算聚类分配使CBM能捕获任务相关信息。此外,CBM通过促进组内表示一致性,有助于过滤任务无关信息,从而诱导出抗干扰的鲁棒表示。引人注目的特性是,即使存在多重干扰,CBM仍能实现高样本效率的表示学习。实验表明,CBM显著提升了主流视觉强化学习算法的样本效率,并在多干扰和单干扰设置下均达到最优性能。代码开源于https://github.com/MIRALab-USTC/RL-CBM。