Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.
翻译:近期研究表明,分布式机器学习易受梯度反演攻击的威胁,攻击者可通过分析训练过程中共享的模型梯度重构私有训练数据。已有攻击方法证实,利用完整模型中全部参数的梯度可实现此类重构。然而,我们假设大多数涉及模块乃至其子模块均存在训练数据泄露风险,并在语言模型的各中间层验证了此类漏洞。大量实验表明:单个Transformer层(甚至仅含0.54%参数的线性组件)的梯度即可导致训练数据泄露。此外,我们发现训练过程中对梯度施加差分隐私保护,对这种新型数据泄露漏洞的防护作用有限。