Reinforcement learning agents deployed in the real world often have to cope with partially observable environments. Therefore, most agents employ memory mechanisms to approximate the state of the environment. Recently, there have been impressive success stories in mastering partially observable environments, mostly in the realm of computer games like Dota 2, StarCraft II, or MineCraft. However, existing methods lack interpretability in the sense that it is not comprehensible for humans what the agent stores in its memory. In this regard, we propose a novel memory mechanism that represents past events in human language. Our method uses CLIP to associate visual inputs with language tokens. Then we feed these tokens to a pretrained language model that serves the agent as memory and provides it with a coherent and human-readable representation of the past. We train our memory mechanism on a set of partially observable environments and find that it excels on tasks that require a memory component, while mostly attaining performance on-par with strong baselines on tasks that do not. On a challenging continuous recognition task, where memorizing the past is crucial, our memory mechanism converges two orders of magnitude faster than prior methods. Since our memory mechanism is human-readable, we can peek at an agent's memory and check whether crucial pieces of information have been stored. This significantly enhances troubleshooting and paves the way toward more interpretable agents.
翻译:现实世界中部署的强化学习代理通常需要应对部分可观测的环境。因此,大多数代理采用记忆机制来近似估计环境状态。近年来,在掌控部分可观测环境方面涌现出令人瞩目的成功案例,主要涉及Dota 2、星际争霸II或我的世界等电脑游戏领域。然而,现有方法缺乏可解释性——人类无法理解代理在其记忆中存储了何种信息。为此,我们提出一种新型记忆机制,能以人类语言表征过往事件。该方法利用CLIP将视觉输入与语言标记相关联,随后将这些标记输入预训练语言模型,该模型既充当代理的记忆系统,又为其提供连贯且人类可读的过往信息表征。我们在多组部分可观测环境中训练该记忆机制,发现它在需要记忆组件的任务中表现出色,同时在无需记忆的任务中其性能基本与强基线方法持平。在一项需要精准记忆过往的挑战性连续识别任务中,我们的记忆机制收敛速度比现有方法快两个数量级。由于该记忆机制具有人类可读性,我们能够探查代理的记忆并核验关键信息是否已被存储。这显著增强了故障排查能力,为构建更具可解释性的代理开辟了道路。