This paper addresses the problem of visual feature representation learning with an aim to improve the performance of end-to-end reinforcement learning (RL) models. Specifically, a novel architecture is proposed that uses a heterogeneous loss function, called CRC loss, to learn improved visual features which can then be used for policy learning in RL. The CRC-loss function is a combination of three individual loss functions, namely, contrastive, reconstruction and consistency loss. The feature representation is learned in parallel to the policy learning while sharing the weight updates through a Siamese Twin encoder model. This encoder model is augmented with a decoder network and a feature projection network to facilitate computation of the above loss components. Through empirical analysis involving latent feature visualization, an attempt is made to provide an insight into the role played by this loss function in learning new action-dependent features and how they are linked to the complexity of the problems being solved. The proposed architecture, called CRC-RL, is shown to outperform the existing state-of-the-art methods on the challenging Deep mind control suite environments by a significant margin thereby creating a new benchmark in this field.
翻译:本文针对视觉特征表示学习问题展开研究,旨在提升端到端强化学习模型的性能。具体而言,提出了一种采用异构损失函数(称为CRC损失)的新型架构,通过学习改进的视觉特征以用于强化学习中的策略学习。CRC损失函数由对比损失、重构损失和一致性损失三种独立损失函数组合而成。特征表示与策略学习并行进行,通过孪生双编码器模型共享权重更新。该编码器模型增配了解码器网络和特征投影网络,以支撑上述损失分量的计算。通过包含潜在特征可视化的实证分析,本文试图揭示该损失函数在学习新行动依赖特征中的作用机理,并阐明这些特征与待解决问题复杂性的关联。实验表明,所提出的CRC-RL架构在具有挑战性的Deepmind控制套件环境中显著优于现有最先进方法,从而在该领域树立了新的基准。