Model inversion attacks are a type of privacy attack that reconstructs private data used to train a machine learning model, solely by accessing the model. Recently, white-box model inversion attacks leveraging Generative Adversarial Networks (GANs) to distill knowledge from public datasets have been receiving great attention because of their excellent attack performance. On the other hand, current black-box model inversion attacks that utilize GANs suffer from issues such as being unable to guarantee the completion of the attack process within a predetermined number of query accesses or achieve the same level of performance as white-box attacks. To overcome these limitations, we propose a reinforcement learning-based black-box model inversion attack. We formulate the latent space search as a Markov Decision Process (MDP) problem and solve it with reinforcement learning. Our method utilizes the confidence scores of the generated images to provide rewards to an agent. Finally, the private data can be reconstructed using the latent vectors found by the agent trained in the MDP. The experiment results on various datasets and models demonstrate that our attack successfully recovers the private information of the target model by achieving state-of-the-art attack performance. We emphasize the importance of studies on privacy-preserving machine learning by proposing a more advanced black-box model inversion attack.
翻译:模型逆向攻击是一种隐私攻击方式,仅通过访问机器学习模型即能重构用于训练该模型的私有数据。近年来,利用生成对抗网络(GANs)从公开数据集中提取知识的白盒模型逆向攻击因其卓越的攻击性能而受到广泛关注。然而,当前利用GANs的黑盒模型逆向攻击存在无法在预定查询访问次数内保证攻击过程完成,或无法达到与白盒攻击同等性能的问题。为克服这些局限,我们提出一种基于强化学习的黑盒模型逆向攻击方法。我们将潜在空间搜索建模为马尔可夫决策过程(MDP)问题,并通过强化学习求解。该方法利用生成图像的置信度分数向智能体提供奖励。最终,通过训练完成的MDP智能体所发现的潜在向量,可重构私有数据。在多种数据集和模型上的实验结果表明,该攻击通过实现最先进的攻击性能,成功恢复了目标模型的私有信息。通过提出更先进的黑盒模型逆向攻击,我们强调了隐私保护机器学习研究的重要性。