Recent work has shown the promise of creating generalist, transformer-based, policies for language, vision, and sequential decision-making problems. To create such models, we generally require centralized training objectives, data, and compute. It is of interest if we can more flexibly create generalist policies, by merging together multiple, task-specific, individually trained policies. In this work, we take a preliminary step in this direction through merging, or averaging, subsets of Decision Transformers in weight space trained on different MuJoCo locomotion problems, forming multi-task models without centralized training. We also propose that when merging policies, we can obtain better results if all policies start from common, pre-trained initializations, while also co-training on shared auxiliary tasks during problem-specific finetuning. In general, we believe research in this direction can help democratize and distribute the process of which forms generally capable agents.
翻译:近期研究表明,创建通用型基于变换器的策略在语言、视觉及序列决策问题上具有前景。构建此类模型通常需要集中式训练目标、数据与算力。因此,如何通过融合多个特定任务且独立训练的策略来更灵活地创建通用策略成为研究兴趣所在。本研究通过权重空间中对不同MuJoCo运动控制任务训练的决策变换器子集进行合并(或称平均),初步探索了无需集中式训练即可形成多任务模型的路径。我们同时提出:若所有策略均从共享的预训练初始化开始,并在特定任务微调过程中联合训练共用辅助任务,则融合策略可获得更优结果。总体而言,我们认为该方向的研究有助于推动通用智能体构建过程的民主化与分布式实现。