Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model through simple arithmetic in the weight space, without the need for additional fine-tuning or access to the original training data. However, the factors that determine the success of Task Arithmetic remain unclear. In this paper, we examine Task Arithmetic for multi-task learning by framing it as a one-shot Federated Learning problem. We demonstrate that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning, called Federated Averaging (FedAvg). By leveraging well-established theoretical results from FedAvg, we identify two key factors that impact the performance of Task Arithmetic: data heterogeneity and training heterogeneity. To mitigate these challenges, we adapt several algorithms from Federated Learning to improve the effectiveness of Task Arithmetic. Our experiments demonstrate that applying these algorithms can often significantly boost performance of the merged model compared to the original Task Arithmetic approach. This work bridges Task Arithmetic and Federated Learning, offering new theoretical perspectives on Task Arithmetic and improved practical methodologies for model merging.
翻译:任务算术是一种模型融合技术,它通过在权重空间进行简单的算术运算,将多个模型的能力整合到单一模型中,无需额外微调或访问原始训练数据。然而,决定任务算术成功的关键因素尚不明确。本文通过将任务算术构建为单次联邦学习问题,对其在多任务学习中的应用进行探究。我们证明任务算术在数学上等价于联邦学习中常用的联邦平均算法。借助联邦平均算法中成熟的理论成果,我们识别出影响任务算术性能的两个关键因素:数据异构性与训练异构性。为缓解这些挑战,我们借鉴联邦学习中的多种算法来提升任务算术的有效性。实验表明,相较于原始的任务算术方法,应用这些算法通常能显著提升融合模型的性能。本研究建立了任务算术与联邦学习之间的理论桥梁,为任务算术提供了新的理论视角,并改进了模型融合的实用方法。