This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three individual loss functions, namely, contrastive, reconstruction and consistency loss. The feature representation is learned in parallel to the policy learning while sharing the weight updates through a Siamese Twin encoder model. This encoder model is augmented with a decoder network and a feature projection network to facilitate computation of the above loss components. Through empirical analysis involving latent feature visualization, an attempt is made to provide an insight into the role played by this loss function in learning new action-dependent features and how they are linked to the complexity of the problems being solved. The proposed architecture, called CRC-RL, is shown to outperform the existing state-of-the-art methods on the challenging Deep mind control suite environments by a significant margin thereby creating a new benchmark in this field.
翻译:本文旨在解决视觉特征表示学习问题,以提升端到端强化学习模型的性能。具体而言,提出了一种新型架构,采用名为CRC损失的异构损失函数来学习改进的视觉特征,进而用于强化学习中的策略学习。CRC损失函数由三种独立损失函数组合而成,即对比损失、重构损失和一致性损失。特征表示与策略学习并行进行,并通过孪生双编码器模型共享权重更新。该编码器模型辅以解码器网络和特征投影网络,以促进上述损失分量的计算。通过涉及潜在特征可视化的实证分析,本文试图揭示该损失函数在学习新的动作依赖特征中所起的作用,以及这些特征如何与所解决问题的复杂性相关联。所提出的架构称为CRC-RL,在具有挑战性的Deepmind控制套件环境中显著优于现有最先进方法,从而在该领域树立了新的基准。