In a partially observable Markov decision process (POMDP), an agent typically uses a representation of the past to approximate the underlying MDP. We propose to utilize a frozen Pretrained Language Transformer (PLT) for history representation and compression to improve sample efficiency. To avoid training of the Transformer, we introduce FrozenHopfield, which automatically associates observations with pretrained token embeddings. To form these associations, a modern Hopfield network stores these token embeddings, which are retrieved by queries that are obtained by a random but fixed projection of observations. Our new method, HELM, enables actor-critic network architectures that contain a pretrained language Transformer for history representation as a memory module. Since a representation of the past need not be learned, HELM is much more sample efficient than competitors. On Minigrid and Procgen environments HELM achieves new state-of-the-art results. Our code is available at https://github.com/ml-jku/helm.
翻译:在部分可观测马尔可夫决策过程(POMDP)中,智能体通常利用历史表征来近似底层MDP。本文提出采用冻结的预训练语言Transformer(PLT)进行历史表征与压缩,以提高样本效率。为避免训练Transformer,我们引入FrozenHopfield机制,该机制自动将观测结果与预训练词元嵌入相关联。现代Hopfield网络存储这些词元嵌入,通过随机但固定的观测投影获取查询进行检索。新方法HELM使得演员-评论家网络架构能够包含预训练语言Transformer作为记忆模块用于历史表征。由于无需学习历史表征,HELM相比同类方法具有更高的样本效率。在Minigrid和Procgen环境中,HELM实现了新的最优结果。我们的代码已开源至https://github.com/ml-jku/helm。