The success of Reinforcement Learning (RL) heavily relies on the ability to learn robust representations from the observations of the environment. In most cases, the representations learned purely by the reinforcement learning loss can differ vastly across states depending on how the value functions change. However, the representations learned need not be very specific to the task at hand. Relying only on the RL objective may yield representations that vary greatly across successive time steps. In addition, since the RL loss has a changing target, the representations learned would depend on how good the current values/policies are. Thus, disentangling the representations from the main task would allow them to focus not only on the task-specific features but also the environment dynamics. To this end, we propose locally constrained representations, where an auxiliary loss forces the state representations to be predictable by the representations of the neighboring states. This encourages the representations to be driven not only by the value/policy learning but also by an additional loss that constrains the representations from over-fitting to the value loss. We evaluate the proposed method on several known benchmarks and observe strong performance. Especially in continuous control tasks, our experiments show a significant performance improvement.
翻译:强化学习的成功在很大程度上依赖于从环境观测中学习稳健表示的能力。在大多数情况下,仅通过强化学习损失学习到的表示会因值函数的变化而在不同状态间存在巨大差异。然而,学习到的表示不必过于针对当前任务。仅依赖强化学习目标可能导致表示在连续时间步中剧烈变化。此外,由于强化学习损失的目标是动态变化的,学习到的表示将取决于当前值/策略的优劣。因此,将表示与主任务解耦,能够使其不仅关注任务特定特征,还能关注环境动态。为此,我们提出局部约束表示方法,通过辅助损失强制状态表示可被相邻状态的表示预测。这促使表示不仅由值/策略学习驱动,还受额外损失约束,避免表示过度拟合值损失。我们在多个已知基准上评估了所提方法,观察到强劲的性能表现。尤其是在连续控制任务中,实验显示性能显著提升。