Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.
翻译:模型合并将多个专家模型——这些模型是在不同任务和领域上从一个基础基础模型微调而来——组合成一个能力更强的单一模型。然而,现有的大多数模型合并方法都假设所有专家模型同时可用。实际上,新的任务和领域是随着时间的推移逐步出现的,这需要策略来整合专家模型的知识,因为它们变得可用:我们将这一过程称为时序模型合并。时间维度带来了先前工作未曾解决的独特挑战,并引发了新的问题,例如:当为新任务进行训练时,专家模型应该从合并后的过往专家模型开始,还是从原始基础模型开始?我们是否应该在每个时间步合并所有模型?哪些合并技术最适合时序合并?是否应该使用不同的策略来初始化训练和部署模型?为了回答这些问题,我们提出了一个名为TIME(时序模型专业知识集成)的统一框架,该框架从三个维度定义了时序模型合并:(1)初始化阶段,(2)部署阶段,以及(3)合并技术。利用TIME框架,我们在FoMo-in-Flux基准测试上,跨模型规模、计算预算和学习范围研究了时序模型合并。我们在TIME框架下进行的全面实验使我们能够揭示时序模型合并的关键见解,从而更好地理解当前面临的挑战以及实现有效时序模型合并的最佳实践。