Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image. The changed pixels can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, our experiments also show that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).

翻译：强化学习（RL）智能体通常难以泛化到训练期间未观察到的状态空间环境变化。这一问题在基于图像的强化学习中尤为突出，因为仅改变一个变量（如背景颜色）就可能导致图像中大量像素发生变化。这些变化的像素会显著改变智能体对图像的潜在表示，致使学习到的策略失效。为学习更鲁棒的表示，我们提出时间解耦（TED）——一种自监督辅助任务，它利用强化学习观测的序列特性获得解耦的图像表示。实验表明，与最先进的表示学习方法相比，采用TED作为辅助任务的强化学习算法在持续训练中能更快适应环境变量的变化。由于TED强制要求表示具有解耦结构，我们的实验还显示，用TED训练的策略能够更好地泛化到与任务无关的变量（如背景颜色）的未见值，以及影响最优策略的变量（如目标位置）的未见值。

相关内容

TED

关注 19

TED（指 Technology、Entertainment、Design 在英语中的缩写，即技术、娱乐、设计）是美国的一家私有非营利机构，该机构以它组织的 TED 大会著称。每年3月，TED大会在美国召集众多科学、设计、文学、音乐等领域的杰出人物，分享他们关於技术、社会、人的思考和探索。TED演讲的特点是毫无繁杂冗长的专业讲座，观点响亮，开门见山，种类繁多，看法新颖。

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

43+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日