Offline reinforcement learning (RL) is a challenging task, whose objective is to learn policies from static trajectory data without interacting with the environment. Recently, offline RL has been viewed as a sequence modeling problem, where an agent generates a sequence of subsequent actions based on a set of static transition experiences. However, existing approaches that use transformers to attend to all tokens naively can overlook the dependencies between different tokens and limit long-term dependency learning. In this paper, we propose the Graph Decision Transformer (GDT), a novel offline RL approach that models the input sequence into a causal graph to capture potential dependencies between fundamentally different concepts and facilitate temporal and causal relationship learning. GDT uses a graph transformer to process the graph inputs with relation-enhanced mechanisms, and an optional sequence transformer to handle fine-grained spatial information in visual tasks. Our experiments show that GDT matches or surpasses the performance of state-of-the-art offline RL methods on image-based Atari and OpenAI Gym.
翻译:离线强化学习是一项具有挑战性的任务,其目标是在不与环境交互的情况下,从静态轨迹数据中学习策略。近年来,离线强化学习被视为一种序列建模问题,智能体需基于一组静态转移经验生成后续动作序列。然而,现有方法使用Transformer对所有令牌进行朴素注意力计算时,可能忽略不同令牌间的依赖关系,并限制长期依赖学习。本文提出图决策Transformer——一种新型离线强化学习方法,该方法将输入序列建模为因果图,以捕获本质上不同概念间的潜在依赖关系,并促进时序与因果关系学习。GDT采用图Transformer结合关系增强机制处理图输入,并配备可选序列Transformer以处理视觉任务中的细粒度空间信息。实验表明,在基于图像的Atari和OpenAI Gym环境中,GDT的性能达到或超越现有最先进离线强化学习方法。