Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations.
翻译:最近研究表明,在多回合上下文中进行多样化数据集预训练的Transformer能够泛化到新的强化学习任务中。此前提出的模型存在一个关键局限性:它们依赖预定义的动作空间大小和结构。引入新动作空间通常需要重新收集数据并重新训练模型,这对某些应用场景而言成本高昂。本研究通过提出Headless-AD模型,表明该问题有望得到缓解——该模型尽管仅经过单次训练,却能泛化到大小、语义内容和顺序均可变的离散动作空间。通过伯努利赌博机、上下文赌博机以及网格世界环境的实验,我们证实Headless-AD展现出对从未见过的动作空间的显著泛化能力,甚至在多种环境配置下超越了针对特定动作集训练的专用模型。