The recent success of Transformer in natural language processing has sparked its use in various domains. In offline reinforcement learning (RL), Decision Transformer (DT) is emerging as a promising model based on Transformer. However, we discovered that the attention module of DT is not appropriate to capture the inherent local dependence pattern in trajectories of RL modeled as a Markov decision process. To overcome the limitations of DT, we propose a novel action sequence predictor, named Decision ConvFormer (DC), based on the architecture of MetaFormer, which is a general structure to process multiple entities in parallel and understand the interrelationship among the multiple entities. DC employs local convolution filtering as the token mixer and can effectively capture the inherent local associations of the RL dataset. In extensive experiments, DC achieved state-of-the-art performance across various standard RL benchmarks while requiring fewer resources. Furthermore, we show that DC better understands the underlying meaning in data and exhibits enhanced generalization capability.
翻译:Transformer在自然语言处理领域的近期成功激发了其在多个领域的应用。在离线强化学习(RL)中,基于Transformer的决策Transformer(DT)正成为一种有前景的模型。然而,我们发现DT的注意力模块不适合捕捉以马尔可夫决策过程建模的RL轨迹中固有的局部依赖模式。为了克服DT的局限性,我们提出了一种新颖的动作序列预测器,命名为Decision ConvFormer(DC),基于MetaFormer架构。MetaFormer是一种通用结构,用于并行处理多个实体并理解多个实体间的相互关系。DC采用局部卷积滤波作为标记混合器,能够有效捕捉RL数据集中的固有局部关联。在大量实验中,DC在多个标准RL基准测试中取得了最优性能,同时所需资源更少。此外,我们证明DC能更好地理解数据中的潜在含义,并展现出更强的泛化能力。