Federated Fine-Tuning of Foundation Models via Probabilistic Masking

Foundation Models (FMs) have revolutionized machine learning with their adaptability and high performance across tasks; yet, their integration into Federated Learning (FL) is challenging due to substantial communication overhead from their extensive parameterization. Current communication-efficient FL strategies, such as gradient compression, reduce bitrates to around $1$ bit-per-parameter (bpp). However, these approaches fail to harness the characteristics of FMs, with their large number of parameters still posing a challenge to communication efficiency, even at these bitrate regimes. In this work, we present DeltaMask, a novel method that efficiently fine-tunes FMs in FL at an ultra-low bitrate, well below 1 bpp. DeltaMask employs stochastic masking to detect highly effective subnetworks within FMs and leverage stochasticity and sparsity in client masks to compress updates into a compact grayscale image using probabilistic filters, deviating from traditional weight training approaches. Our comprehensive evaluations across various datasets and architectures demonstrate DeltaMask efficiently achieves bitrates as low as 0.09 bpp, enhancing communication efficiency while maintaining FMs performance, as measured on 8 datasets and 5 pre-trained models of various network architectures.

翻译：基础模型（FMs）凭借其适应性和跨任务的高性能彻底改变了机器学习；然而，由于大量参数化带来的通信开销，将其集成到联邦学习（FL）中颇具挑战。当前通信高效的FL策略（如梯度压缩）可将比特率降至约1比特/参数（bpp）。然而，这些方法未能利用FMs的特性，即使在此比特率下，其庞大的参数数量仍对通信效率构成挑战。本研究提出DeltaMask——一种新颖方法，可在超低比特率（远低于1 bpp）下高效微调FL中的FMs。DeltaMask采用随机掩膜检测FMs中的高效子网络，并利用客户端掩膜的随机性和稀疏性，通过概率滤波器将更新压缩为紧凑的灰度图像，突破了传统的权重训练方法。我们在多种数据集和架构上的综合评估表明，DeltaMask高效实现了低至0.09 bpp的比特率，在8个数据集和5种不同网络架构的预训练模型上的测量显示，该方法在提升通信效率的同时保持了FMs的性能。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日