Machine learning models can be trained with formal privacy guarantees via differentially private optimizers such as DP-SGD. In this work, we study such privacy guarantees when the adversary only accesses the final model, i.e., intermediate model updates are not released. In the existing literature, this hidden state threat model exhibits a significant gap between the lower bound provided by empirical privacy auditing and the theoretical upper bound provided by privacy accounting. To challenge this gap, we propose to audit this threat model with adversaries that craft a gradient sequence to maximize the privacy loss of the final model without accessing intermediate models. We demonstrate experimentally how this approach consistently outperforms prior attempts at auditing the hidden state model. When the crafted gradient is inserted at every optimization step, our results imply that releasing only the final model does not amplify privacy, providing a novel negative result. On the other hand, when the crafted gradient is not inserted at every step, we show strong evidence that a privacy amplification phenomenon emerges in the general non-convex setting (albeit weaker than in convex regimes), suggesting that existing privacy upper bounds can be improved.
翻译:机器学习模型可以通过差分隐私优化器(如DP-SGD)在形式化隐私保证下进行训练。本文研究当攻击者仅能访问最终模型(即不发布中间模型更新)时的隐私保证。现有文献表明,隐藏状态威胁模型中实证隐私审计提供的下界与隐私核算提供的理论上界之间存在显著差距。为挑战这一差距,我们提出通过设计梯度序列来审计该威胁模型,使攻击者能在不访问中间模型的情况下最大化最终模型的隐私损失。实验证明,该方法在审计隐藏状态模型时持续优于先前的尝试。当精心设计的梯度被插入每个优化步骤时,我们的结果表明仅发布最终模型不会增强隐私保护,这提供了一个新的否定性结论。另一方面,当梯度未在每个步骤插入时,我们提供了有力证据表明在一般非凸场景中会出现隐私放大现象(尽管弱于凸优化场景),这暗示现有隐私上界存在改进空间。