Heterogeneous Federated Learning with Splited Language Model

Federated Split Learning (FSL) is a promising distributed learning paradigm in practice, which gathers the strengths of both Federated Learning (FL) and Split Learning (SL) paradigms, to ensure model privacy while diminishing the resource overhead of each client, especially on large transformer models in a resource-constrained environment, e.g., Internet of Things (IoT). However, almost all works merely investigate the performance with simple neural network models in FSL. Despite the minor efforts focusing on incorporating Vision Transformers (ViT) as model architectures, they train ViT from scratch, thereby leading to enormous training overhead in each device with limited resources. Therefore, in this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness. Furthermore, we propose FedVZ to hinder the gradient inversion attack, especially having the capability compatible with black-box scenarios, where the gradient information is unavailable. Concretely, FedVZ approximates the server gradient by utilizing a zeroth-order (ZO) optimization, which replaces the backward propagation with just one forward process. Empirically, we are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits. Our experiments verify the effectiveness of our algorithms.

翻译：联邦分割学习（FSL）是一种在实践中具有前景的分布式学习范式，它融合了联邦学习（FL）和分割学习（SL）两大范式的优势，在保障模型隐私的同时降低每个客户端的资源开销，尤其适用于资源受限环境（如物联网）中的大型Transformer模型。然而，几乎所有现有工作仅研究FSL中简单神经网络模型的性能。尽管有少数工作尝试将视觉Transformer（ViT）作为模型架构，但它们从头开始训练ViT，导致资源有限的每个设备产生巨大的训练开销。因此，本文利用预训练图像Transformer（PIT）作为初始模型（命名为FedV），以加速训练过程并提升模型鲁棒性。此外，我们提出FedVZ来防御梯度反演攻击，尤其具备兼容黑盒场景的能力（其中梯度信息不可用）。具体而言，FedVZ利用零阶（ZO）优化近似服务器梯度，用单次前向传播替代反向传播。在实证方面，我们首次在真实数据集、不同部分设备参与以及异构数据拆分场景下，对基于PIT的FSL方法进行了系统评估。实验验证了我们算法的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日