The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.
翻译:决策Transformer(DT)为离线强化学习建立了一种强大的序列建模方法。它通过回报目标(RTG)来条件化其动作预测,在训练期间利用RTG区分轨迹质量,并在推理时指导动作生成。本研究发现该设计存在一个关键冗余:将整个RTG序列输入Transformer在理论上是非必要的,因为只有最近的RTG会影响动作预测。我们通过实验证明这种冗余可能损害DT的性能。为解决此问题,我们提出解耦DT(DDT)。DDT通过仅将观测序列和动作序列输入Transformer处理来简化架构,同时使用最新RTG指导动作预测。这种简化的方法不仅提升了性能,还降低了计算成本。实验表明,DDT在多个离线强化学习任务中显著优于DT,并与最先进的DT变体取得了具有竞争力的性能。