Catastrophic forgetting poses a substantial challenge for managing intelligent agents controlled by a large model, causing performance degradation when these agents face new tasks. In our work, we propose a novel solution - the Progressive Prompt Decision Transformer (P2DT). This method enhances a transformer-based model by dynamically appending decision tokens during new task training, thus fostering task-specific policies. Our approach mitigates forgetting in continual and offline reinforcement learning scenarios. Moreover, P2DT leverages trajectories collected via traditional reinforcement learning from all tasks and generates new task-specific tokens during training, thereby retaining knowledge from previous studies. Preliminary results demonstrate that our model effectively alleviates catastrophic forgetting and scales well with increasing task environments.
翻译:灾难性遗忘对由大模型控制的智能体管理构成重大挑战,当这些智能体面对新任务时会导致性能下降。在本工作中,我们提出了一种新颖的解决方案——渐进提示决策Transformer(P2DT)。该方法通过在新任务训练过程中动态附加决策令牌来增强基于Transformer的模型,从而促进任务特定策略的形成。我们的方法有效缓解了连续和离线强化学习场景中的遗忘现象。此外,P2DT利用通过传统强化学习从所有任务中收集的轨迹数据,在训练过程中生成新任务特定令牌,从而保留先前研究的知识。初步结果表明,我们的模型能够有效缓解灾难性遗忘,并在任务环境数量增加时保持良好的可扩展性。