We investigate the use of transformer sequence models as dynamics models (TDMs) for control. We find that TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist TDM is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist TDM is applied to an unseen environment without any further training. Here, we demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. Additional results show that TDMs also perform well in a single-environment learning setting when compared to a number of baseline models. These properties make TDMs a promising ingredient for a foundation model of control.
翻译:我们研究了将Transformer序列模型作为动力学模型(TDM)用于控制的问题。我们发现TDM对未见环境展现出强大的泛化能力,这既体现在小样本场景中——通用TDM通过少量目标环境数据进行微调,也体现在零样本场景中——通用TDM无需任何额外训练即可应用于未见环境。我们证明,对系统动力学进行泛化比直接以策略形式泛化最优行为效果更佳。进一步结果表明,与多种基线模型相比,TDM在单环境学习场景中也表现良好。这些特性使TDM成为构建控制基础模型的有前景的组成部分。