We investigate the use of transformer sequence models as dynamics models (TDMs) for control. In a number of experiments in the DeepMind control suite, we find that first, TDMs perform well in a single-environment learning setting when compared to baseline models. Second, TDMs exhibit strong generalization capabilities to unseen environments, both in a few-shot setting, where a generalist model is fine-tuned with small amounts of data from the target environment, and in a zero-shot setting, where a generalist model is applied to an unseen environment without any further training. We further demonstrate that generalizing system dynamics can work much better than generalizing optimal behavior directly as a policy. This makes TDMs a promising ingredient for a foundation model of control.
翻译:我们研究了将Transformer序列模型作为控制任务中的动力学模型(TDMs)的应用。在DeepMind控制套件的一系列实验中,我们首先发现,在单环境学习场景下,TDM相较于基线模型表现出色。其次,TDM展现出对未见环境的强大泛化能力:在少样本场景中,通用模型仅需少量目标环境数据微调即可适应;在零样本场景中,无需额外训练即可直接应用于全新环境。我们进一步证明,泛化系统动力学比直接泛化最优策略行为具有显著优势。这使得TDM有望成为控制领域基础模型的关键组成部分。