Membership inference attacks aim to infer whether an individual record was used to train a model, serving as a key tool for disclosure auditing. While such evaluations are useful to demonstrate risk, they are computationally expensive and often make strong assumptions about potential adversaries' access to models and training environments, and thus do not provide very tight bounds on leakage from potential attacks. We show how prior claims around black-box access being sufficient for optimal membership inference do not hold for most useful settings such as stochastic gradient descent, and that optimal membership inference indeed requires white-box access. We validate our findings with a new white-box inference attack IHA (Inverse Hessian Attack) that explicitly uses model parameters by taking advantage of computing inverse-Hessian vector products. Our results show that both audits and adversaries may be able to benefit from access to model parameters, and we advocate for further research into white-box methods for membership privacy auditing.
翻译:成员推断攻击旨在推断特定个体记录是否被用于模型训练,作为披露审计的关键工具。虽然此类评估有助于揭示风险,但其计算成本高昂,且通常对潜在攻击者获取模型和训练环境的权限做出较强假设,因而无法为潜在攻击的泄露风险提供严格边界。我们证明先前关于黑盒访问足以实现最优成员推断的论断在随机梯度下降等多数实用场景中并不成立,且最优成员推断确实需要白盒访问权限。我们通过新型白盒推断攻击IHA(逆海森攻击)验证了该发现,该攻击利用计算逆海森向量积显式使用模型参数。结果表明,审计方与攻击者均可从模型参数访问中获益,我们主张进一步开展面向成员隐私审计的白盒方法研究。