Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.
翻译:差分隐私深度学习在过去几年中引起了广泛关注,催生了众多旨在提升模型精度与训练效率的方法。本文深入研究了差分隐私条件下Transformer模型的训练问题。我们的处理方式具有模块化特征:其核心逻辑是将差分隐私Transformer的训练问题"归约"至更基础的差分隐私标准神经网络训练问题。后者已有较充分的理解,且适用于多种模型无关的方法。此类"归约"通过首先识别差分隐私Transformer训练特有的难点实现:注意力分散现象以及与现有高效梯度裁剪技术缺乏兼容性。针对这两个问题,我们分别提出了重注意力机制与幻影裁剪技术。我们相信,本研究不仅为差分隐私Transformer训练提供了新的视角,更通过模块化处理方式推动了差分隐私深度学习领域的研究进展。