模型能否合并？论模型可合并性的成因 (Will it Merge? On The Causes of Model Mergeability) - 专知论文

会员服务 ·

0

微调 · 知识 · 大模型 · 合成 · 多任务模型 ·

Will it Merge? On The Causes of Model Mergeability

翻译：模型能否合并？论模型可合并性的成因

Adir Rahamim,Asaf Yehudai,Boaz Carmeli,Leshem Choshen,Yosi Mass,Yonatan Belinkov

Model merging has emerged as a promising technique for combining multiple fine-tuned models into a single multitask model without retraining. However, the factors that determine whether merging will succeed or fail remain poorly understood. In this work, we investigate why specific models are merged better than others. To do so, we propose a concrete, measurable definition of mergeability. We investigate several potential causes for high or low mergeability, highlighting the base model knowledge as a dominant factor: Models fine-tuned on instances that the base model knows better are more mergeable than models fine-tuned on instances that the base model struggles with. Based on our mergeability definition, we explore a simple weighted merging technique that better preserves weak knowledge in the base model.

翻译：模型合并已成为一种有前景的技术，能够在不重新训练的情况下，将多个微调后的模型组合成一个单一的多任务模型。然而，决定合并成功或失败的因素仍不甚明了。在本工作中，我们探究了为何特定模型比其他模型更易于合并。为此，我们提出了一个具体、可度量的可合并性定义。我们研究了导致高或低可合并性的若干潜在原因，并指出基础模型的知识是一个主导因素：在基础模型更熟悉的实例上微调的模型，比在基础模型难以处理的实例上微调的模型具有更高的可合并性。基于我们的可合并性定义，我们探索了一种简单的加权合并技术，该技术能更好地保留基础模型中的薄弱知识。

0

相关内容

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【NAACL2021】信息解缠正则化持续学习的文本分类

【NAACL2021】信息解缠正则化持续学习的文本分类

专知会员服务

22+阅读 · 2021年4月11日

【机器伦理学综述论文，37页pdf】Implementations in Machine Ethics: A Survey

专知会员服务

13+阅读 · 2020年1月23日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

52+阅读 · 2022年6月2日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

The Practicality of Normalizing Flow Test-Time Training in Bayesian Inference for Agent-Based Models

Arxiv

0+阅读 · 1月12日

On the Design of One-step Diffusion via Shortcutting Flow Paths

Arxiv

0+阅读 · 1月12日

Compounded Linear Failure Rate Distribution: Properties, Simulation and Analysis

Arxiv

0+阅读 · 1月12日

Lower Bounds for the Algorithmic Complexity of Learned Indexes

Arxiv

0+阅读 · 1月10日

Uncertainty Analysis of Experimental Parameters for Reducing Warpage in Injection Molding

Arxiv

0+阅读 · 1月8日

VIP会员

文章信息

相关主题

多任务模型

相关VIP内容

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

【伯克利JD Co-Reyes博士论文】建立强化学习算法泛化:从潜在动力学模型到元学习，Building Reinforcement Learning Algorithms that Generalize: From Latent Dynamics Models to Meta-Learning

专知会员服务

45+阅读 · 2022年3月6日

【NAACL2021】信息解缠正则化持续学习的文本分类

【NAACL2021】信息解缠正则化持续学习的文本分类

专知会员服务

22+阅读 · 2021年4月11日

【机器伦理学综述论文，37页pdf】Implementations in Machine Ethics: A Survey

专知会员服务

13+阅读 · 2020年1月23日

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

【贝叶斯规则因果推理】《Causal Inference with Bayes Rule》by Finn Lattimore, David Rohde

专知会员服务

48+阅读 · 2019年12月13日

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

【Facebook AI】对抗性NLI:自然语言理解的新基准，Adversarial NLI: A New Benchmark for Natural Language Understanding

专知会员服务

11+阅读 · 2019年11月2日

热门VIP内容

开通专知VIP会员享更多权益服务

具身智能中的语义生命周期：基于基础模型的获取、表征与存储

《TERRADEFENDER：一个用于战略战场情报准备的统一平台》

【NTU博士论文】视频生成新突破：从人脸说话视频到通用视频制作

麻省理工学院启动新项目为人工智能时代培训军事领导者

相关资讯

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

《面向军事应用的数据驱动的行为建模》荷兰应用科学研究组织（NTO）

专知

52+阅读 · 2022年6月2日

论文浅尝 | Interaction Embeddings for Prediction and Explanation

论文浅尝 | Interaction Embeddings for Prediction and Explanation

开放知识图谱

11+阅读 · 2019年2月1日

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

论文浅尝 | Know-Evolve: Deep Temporal Reasoning for Dynamic KG

开放知识图谱

36+阅读 · 2018年3月30日

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

语义分割中的深度学习方法全解：从FCN、SegNet到DeepLab

炼数成金订阅号

26+阅读 · 2017年7月10日

MNIST入门：贝叶斯方法

MNIST入门：贝叶斯方法

Python程序员

23+阅读 · 2017年7月3日

相关论文

The Practicality of Normalizing Flow Test-Time Training in Bayesian Inference for Agent-Based Models

Arxiv

0+阅读 · 1月12日

On the Design of One-step Diffusion via Shortcutting Flow Paths

Arxiv

0+阅读 · 1月12日

Compounded Linear Failure Rate Distribution: Properties, Simulation and Analysis

Arxiv

0+阅读 · 1月12日

Lower Bounds for the Algorithmic Complexity of Learned Indexes

Arxiv

0+阅读 · 1月10日

Uncertainty Analysis of Experimental Parameters for Reducing Warpage in Injection Molding

Arxiv

0+阅读 · 1月8日

相关基金

测量误差数据下部分线性模型有约束统计推断理论

国家自然科学基金

2+阅读 · 2015年12月31日

反问题的数学建模、计算及应用

国家自然科学基金

4+阅读 · 2015年12月31日

基于自主学习的Ad hoc Agent序贯决策研究

国家自然科学基金

46+阅读 · 2015年12月31日

高维数据下的模型平均方法

国家自然科学基金

6+阅读 · 2014年12月31日

复杂多元数据的半参数统计推断

国家自然科学基金

5+阅读 · 2014年12月31日

微信扫码咨询专知VIP会员