In a model inversion (MI) attack, an adversary abuses access to a machine learning (ML) model to infer and reconstruct private training data. Remarkable progress has been made in the white-box and black-box setups, where the adversary has access to the complete model or the model's soft output respectively. However, there is very limited study in the most challenging but practically important setup: Label-only MI attacks, where the adversary only has access to the model's predicted label (hard label) without confidence scores nor any other model information. In this work, we propose LOKT, a novel approach for label-only MI attacks. Our idea is based on transfer of knowledge from the opaque target model to surrogate models. Subsequently, using these surrogate models, our approach can harness advanced white-box attacks. We propose knowledge transfer based on generative modelling, and introduce a new model, Target model-assisted ACGAN (T-ACGAN), for effective knowledge transfer. Our method casts the challenging label-only MI into the more tractable white-box setup. We provide analysis to support that surrogate models based on our approach serve as effective proxies for the target model for MI. Our experiments show that our method significantly outperforms existing SOTA Label-only MI attack by more than 15% across all MI benchmarks. Furthermore, our method compares favorably in terms of query budget. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our study highlights rising privacy threats for ML models even when minimal information (i.e., hard labels) is exposed. Our code, demo, models and reconstructed data are available at our project page: https://ngoc-nguyen-0.github.io/lokt/
翻译:在模型逆向攻击中,攻击者滥用对机器学习模型的访问权限,推断并重建私有训练数据。在白盒和黑盒场景下(攻击者分别可访问完整模型或模型的软输出),相关研究已取得显著进展。然而,在最具挑战性且实际重要的场景——标签唯一模型逆向攻击(攻击者仅能获取模型预测的硬标签,无置信度分数或其他模型信息)中,研究仍十分有限。本文提出LOKT方法,一种针对标签唯一模型逆向攻击的新颖方案。其核心思想是将不透明目标模型的知识迁移至替代模型,进而利用这些替代模型实施先进的白盒攻击。我们设计了基于生成建模的知识迁移机制,并引入新型模型——目标模型辅助的ACGAN(T-ACGAN)以实现高效知识迁移。该方法将具有挑战性的标签唯一逆向攻击转化为更易处理的白盒场景。理论分析表明,基于本方法构建的替代模型可作为目标模型的有效代理用于模型逆向攻击。实验结果显示,在所有模型逆向基准测试中,本方法显著超越现有最优标签唯一逆向攻击方法,性能提升超过15%。此外,本方法在查询预算方面也更具优势。本研究揭示了即使仅暴露最少信息(如硬标签),机器学习模型仍面临日益严峻的隐私威胁。代码、演示、模型及重构数据已公开于项目页面:https://ngoc-nguyen-0.github.io/lokt/