How to Merge Your Multimodal Models Over Time?

Model merging combines multiple expert models - finetuned from a base foundation model on diverse tasks and domains - into a single, more capable model. However, most existing model merging approaches assume that all experts are available simultaneously. In reality, new tasks and domains emerge progressively over time, requiring strategies to integrate the knowledge of expert models as they become available: a process we call temporal model merging. The temporal dimension introduces unique challenges not addressed in prior work, raising new questions such as: when training for a new task, should the expert model start from the merged past experts or from the original base model? Should we merge all models at each time step? Which merging techniques are best suited for temporal merging? Should different strategies be used to initialize the training and deploy the model? To answer these questions, we propose a unified framework called TIME - Temporal Integration of Model Expertise - which defines temporal model merging across three axes: (1) Initialization Phase, (2) Deployment Phase, and (3) Merging Technique. Using TIME, we study temporal model merging across model sizes, compute budgets, and learning horizons on the FoMo-in-Flux benchmark. Our comprehensive suite of experiments across TIME allows us to uncover key insights for temporal model merging, offering a better understanding of current challenges and best practices for effective temporal model merging.

翻译：模型合并将多个专家模型——这些模型是在不同任务和领域上从一个基础基础模型微调而来——组合成一个能力更强的单一模型。然而，现有的大多数模型合并方法都假设所有专家模型同时可用。实际上，新的任务和领域是随着时间的推移逐步出现的，这需要策略来整合专家模型的知识，因为它们变得可用：我们将这一过程称为时序模型合并。时间维度带来了先前工作未曾解决的独特挑战，并引发了新的问题，例如：当为新任务进行训练时，专家模型应该从合并后的过往专家模型开始，还是从原始基础模型开始？我们是否应该在每个时间步合并所有模型？哪些合并技术最适合时序合并？是否应该使用不同的策略来初始化训练和部署模型？为了回答这些问题，我们提出了一个名为TIME（时序模型专业知识集成）的统一框架，该框架从三个维度定义了时序模型合并：（1）初始化阶段，（2）部署阶段，以及（3）合并技术。利用TIME框架，我们在FoMo-in-Flux基准测试上，跨模型规模、计算预算和学习范围研究了时序模型合并。我们在TIME框架下进行的全面实验使我们能够揭示时序模型合并的关键见解，从而更好地理解当前面临的挑战以及实现有效时序模型合并的最佳实践。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日