Zero redundancy distributed learning with differential privacy

Deep learning using large models have achieved great success in a wide range of domains. However, training these models on billions of parameters is very challenging in terms of the training speed, memory cost, and communication efficiency, especially under the privacy-preserving regime with differential privacy (DP). On the one hand, DP optimization has comparable efficiency to the standard non-private optimization on a single GPU, but on multiple GPUs, existing DP distributed learning (such as pipeline parallel) has suffered from significantly worse efficiency. On the other hand, the Zero Redundancy Optimizer (ZeRO) is a state-of-the-art solution to the standard distributed learning, exhibiting excellent training efficiency on large models, but to work compatibly with DP is technically complicated. In this work, we develop a new systematic solution, DP-ZeRO, (I) to scale up the trainable DP model size, e.g. to GPT-100B, (II) to obtain the same computation and communication efficiency as the standard ZeRO, and (III) to enable mixed-precision DP training. Our DP-ZeRO, like the standard ZeRO, has the potential to train models with arbitrary size and is evaluated on the world's largest DP models in terms of the number of trainable parameters.

翻译：使用大模型的深度学习在众多领域取得了巨大成功。然而，在隐私保护机制下，针对数十亿参数训练这些模型面临着训练速度、内存成本和通信效率的严峻挑战，尤其是在结合差分隐私（DP）时。一方面，DP优化在单GPU上具有与标准非私有优化相当的效率，但在多GPU上，现有的DP分布式学习（如流水线并行）效率显著降低。另一方面，零冗余优化器（ZeRO）是标准分布式学习的最先进解决方案，在大模型训练中展现出卓越的效率，但与DP的兼容性在技术上较为复杂。在本工作中，我们开发了一种新的系统性解决方案——DP-ZeRO，旨在（I）扩展可训练的DP模型规模，例如达到GPT-100B，（II）获得与标准ZeRO相同的计算和通信效率，以及（III）实现混合精度DP训练。我们的DP-ZeRO与标准ZeRO类似，具有训练任意规模模型的潜力，并在可训练参数数量方面，对全球最大的DP模型进行了评估。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2022】基于知识蒸馏的高效预训练

专知会员服务

32+阅读 · 2022年4月23日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日