Membership inference attacks are used as a key tool for disclosure auditing. They aim to infer whether an individual record was used to train a model. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for stochastic gradient descent, and that optimal membership inference indeed requires white-box access. Our theoretical results lead to a new white-box inference attack, IHA (Inverse Hessian Attack), that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both auditors and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership inference.
翻译:成员推断攻击被用作披露审计的关键工具,其旨在推断特定数据记录是否被用于模型训练。虽然此类评估有助于揭示风险,但计算成本高昂,且通常对潜在攻击者访问模型和训练环境的能力做出强假设,因而无法为潜在攻击的泄露风险提供严格边界。本文证明先前关于黑盒访问足以实现最优成员推断的论断在随机梯度下降场景中并不成立,且最优成员推断确实需要白盒访问权限。我们的理论结果催生了一种新的白盒推断攻击——IHA(逆海森攻击),该攻击通过计算逆海森向量积显式利用模型参数。研究结果表明,审计方与攻击者均可从模型参数访问中获益,我们主张进一步开展针对成员推断白盒方法的研究。