Recent studies have shown that distributed machine learning is vulnerable to gradient inversion attacks, where private training data can be reconstructed by analyzing the gradients of the models shared in training. Previous attacks established that such reconstructions are possible using gradients from all parameters in the entire models. However, we hypothesize that most of the involved modules, or even their sub-modules, are at risk of training data leakage, and we validate such vulnerabilities in various intermediate layers of language models. Our extensive experiments reveal that gradients from a single Transformer layer, or even a single linear component with 0.54% parameters, are susceptible to training data leakage. Additionally, we show that applying differential privacy on gradients during training offers limited protection against the novel vulnerability of data disclosure.
翻译:近期研究表明,分布式机器学习易受梯度反演攻击,通过分析训练过程中共享的模型梯度可重构私有训练数据。已有攻击方法证实,利用完整模型中全部参数的梯度可实现此类重构。然而,我们提出假设:大多数相关模块乃至其子模块均存在训练数据泄露风险,并在语言模型的各中间层验证了此类漏洞。大量实验表明,单个Transformer层——甚至仅含0.54%参数的单一线性组件——的梯度仍会导致训练数据泄露。此外,我们证明训练期间对梯度施加差分隐私对此类新型数据泄露漏洞的防护作用有限。