Large transformer models trained on diverse datasets have shown a remarkable ability to learn in-context, achieving high few-shot performance on tasks they were not explicitly trained to solve. In this paper, we study the in-context learning capabilities of transformers in decision-making problems, i.e., reinforcement learning (RL) for bandits and Markov decision processes. To do so, we introduce and study Decision-Pretrained Transformer (DPT), a supervised pretraining method where the transformer predicts an optimal action given a query state and an in-context dataset of interactions, across a diverse set of tasks. This procedure, while simple, produces a model with several surprising capabilities. We find that the pretrained transformer can be used to solve a range of RL problems in-context, exhibiting both exploration online and conservatism offline, despite not being explicitly trained to do so. The model also generalizes beyond the pretraining distribution to new tasks and automatically adapts its decision-making strategies to unknown structure. Theoretically, we show DPT can be viewed as an efficient implementation of Bayesian posterior sampling, a provably sample-efficient RL algorithm. We further leverage this connection to provide guarantees on the regret of the in-context algorithm yielded by DPT, and prove that it can learn faster than algorithms used to generate the pretraining data. These results suggest a promising yet simple path towards instilling strong in-context decision-making abilities in transformers.
翻译:在大规模多样化数据集上训练的大型Transformer模型展现出了显著的上下文学习能力,能够在未明确训练解决的任务上实现高样本效率的少样本性能。本文研究了Transformer在决策问题(即针对多臂赌博机和马尔可夫决策过程的强化学习)中的上下文学习能力。为此,我们提出并研究了决策预训练Transformer(DPT)——一种监督式预训练方法,该方法通过让Transformer在多样化任务集合中,基于查询状态和上下文交互数据集预测最优动作。这一简单流程却产生了具备若干令人惊讶能力的模型。我们发现,尽管未被明确训练,该预训练Transformer能够用于解决一系列上下文强化学习问题,既能进行在线探索也能实现离线保守策略。该模型还能泛化至预训练分布之外的新任务,并自动调整其决策策略以应对未知结构。理论上,我们证明DPT可视为贝叶斯后验采样(一种理论上样本高效的强化学习算法)的高效实现。我们进一步利用这一关联,为DPT生成的上下文算法提供了遗憾界保证,并证明其能够比生成预训练数据的算法学习得更快。这些结果表明,通过一条有前景的简单路径即可在Transformer中注入强大的上下文决策能力。