Recently, it has been shown that transformers pre-trained on diverse datasets with multi-episode contexts can generalize to new reinforcement learning tasks in-context. A key limitation of previously proposed models is their reliance on a predefined action space size and structure. The introduction of a new action space often requires data re-collection and model re-training, which can be costly for some applications. In our work, we show that it is possible to mitigate this issue by proposing the Headless-AD model that, despite being trained only once, is capable of generalizing to discrete action spaces of variable size, semantic content and order. By experimenting with Bernoulli and contextual bandits, as well as a gridworld environment, we show that Headless-AD exhibits significant capability to generalize to action spaces it has never encountered, even outperforming specialized models trained for a specific set of actions on several environment configurations. Implementation is available at: https://github.com/corl-team/headless-ad.
翻译:近期研究表明,在多样化数据集上通过多轮次上下文进行预训练的Transformer模型能够对新的强化学习任务实现上下文泛化。先前提出的模型存在一个关键局限:其依赖于预定义的动作空间规模与结构。引入新的动作空间通常需要重新收集数据并重新训练模型,这在某些应用中成本高昂。本研究证明,通过提出Headless-AD模型可缓解该问题——该模型仅需单次训练,即能泛化至具有可变规模、语义内容及顺序的离散动作空间。通过在伯努利与上下文赌博机以及网格世界环境中的实验,我们证明Headless-AD展现出显著泛化至未见动作空间的能力,在多种环境配置下甚至优于针对特定动作集训练的专业化模型。实现代码发布于:https://github.com/corl-team/headless-ad。