Efficient Language Model Architectures for Differentially Private Federated Learning

Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices. SGD is the standard client optimizer for on device training in cross-device FL, favored for its memory and computational efficiency. However, in centralized training of neural language models, adaptive optimizers are preferred as they offer improved stability and performance. In light of this, we ask if language models can be modified such that they can be efficiently trained with SGD client optimizers and answer this affirmatively. We propose a scale-invariant Coupled Input Forget Gate (SI CIFG) recurrent network by modifying the sigmoid and tanh activations in the recurrent cell and show that this new model converges faster and achieves better utility than the standard CIFG recurrent model in cross-device FL in large scale experiments. We further show that the proposed scale invariant modification also helps in federated learning of larger transformer models. Finally, we demonstrate the scale invariant modification is also compatible with other non-adaptive algorithms. Particularly, our results suggest an improved privacy utility trade-off in federated learning with differential privacy.

翻译：跨设备联邦学习是一种技术，可在通常数百万边缘设备上分布的数据上训练模型，且数据无需离开设备。SGD 是跨设备联邦学习中设备端训练的标准客户端优化器，因其内存和计算效率高而备受青睐。然而，在神经语言模型的集中式训练中，自适应优化器更受青睐，因其能提供更好的稳定性和性能。为此，我们探究是否可以对语言模型进行修改，使其能够通过 SGD 客户端优化器高效训练，并给出了肯定答案。我们通过修改循环单元中的 sigmoid 和 tanh 激活函数，提出了一种尺度不变耦合输入遗忘门（SI CIFG）循环网络，并在大规模实验中证明，该新模型在跨设备联邦学习中收敛更快，且比标准 CIFG 循环模型具有更好的实用性。我们进一步表明，所提出的尺度不变修改也有助于更大规模 Transformer 模型的联邦学习。最后，我们证明尺度不变修改同样适用于其他非自适应算法。特别是，我们的结果表明，在具有差分隐私的联邦学习中，隐私-效用权衡得到了改善。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日