Model inversion (MI) attacks have raised increasing concerns about privacy, which can reconstruct training data from public models. Indeed, MI attacks can be formalized as an optimization problem that seeks private data in a certain space. Recent MI attacks leverage a generative adversarial network (GAN) as an image prior to narrow the search space, and can successfully reconstruct even the high-dimensional data (e.g., face images). However, these generative MI attacks do not fully exploit the potential capabilities of the target model, still leading to a vague and coupled search space, i.e., different classes of images are coupled in the search space. Besides, the widely used cross-entropy loss in these attacks suffers from gradient vanishing. To address these problems, we propose Pseudo Label-Guided MI (PLG-MI) attack via conditional GAN (cGAN). At first, a top-n selection strategy is proposed to provide pseudo-labels for public data, and use pseudo-labels to guide the training of the cGAN. In this way, the search space is decoupled for different classes of images. Then a max-margin loss is introduced to improve the search process on the subspace of a target class. Extensive experiments demonstrate that our PLG-MI attack significantly improves the attack success rate and visual quality for various datasets and models, notably, 2~3 $\times$ better than state-of-the-art attacks under large distributional shifts. Our code is available at: https://github.com/LetheSec/PLG-MI-Attack.
翻译:模型反演攻击对隐私问题日益引发关注,其能通过公开模型重建训练数据。实际上,模型反演攻击可被形式化为在特定空间中搜寻私有数据的优化问题。近期模型反演攻击利用生成对抗网络作为图像先验以缩小搜索空间,并成功重建高维数据。然而,这些生成式模型反演攻击未能充分利用目标模型的潜在能力,导致搜索空间依然模糊且耦合,即不同类别的图像在搜索空间中相互纠缠。此外,攻击中广泛使用的交叉熵损失存在梯度消失问题。为解决上述问题,我们提出基于条件生成对抗网络的伪标签引导模型反演攻击。首先,提出一种top-n选择策略为公共数据提供伪标签,并利用伪标签指导条件生成对抗网络的训练。通过这种方式,不同类别图像的搜索空间得以解耦。随后引入最大间隔损失以改进目标类别子空间上的搜索过程。大量实验表明,所提出的伪标签引导模型反演攻击显著提升了各数据集和模型的攻击成功率与视觉效果,特别是在分布偏移较大的场景下,其性能较现有最优攻击方法提升2~3倍。代码已开源在:https://github.com/LetheSec/PLG-MI-Attack。