Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints

from arxiv, This work has been accepted for publication in the 4th IEEE Conference on Secure and Trustworthy Machine Learning (IEEE SaTML 2026). The final version will be available on IEEE Xplore

Model fingerprint detection has shown promise to trace the provenance of AI-generated images in forensic applications. However, despite the inherent adversarial nature of these applications, existing evaluations rarely consider adversarial settings. We present the first systematic security evaluation of these techniques, formalizing threat models that encompass both white- and black-box access and two attack goals: fingerprint removal, which erases identifying traces to evade attribution, and fingerprint forgery, which seeks to cause misattribution to a target model. We implement five attack strategies and evaluate 14 representative fingerprinting methods across RGB, frequency, and learned-feature domains on 12 state-of-the-art image generators. Our experiments reveal a pronounced gap between clean and adversarial performance. Removal attacks are highly effective, often achieving success rates above 80% in white-box settings and over 50% under black-box access. While forgery is more challenging than removal, its success varies significantly across targeted models. We also observe a utility-robustness trade-off: accurate attribution methods are often vulnerable to attacks and, although some techniques are robust in specific settings, none achieves robustness and accuracy across all evaluated threat models. These findings highlight the need for techniques that balance robustness and accuracy, and we identify the most promising approaches toward this goal. Code available at: https://github.com/kaikaiyao/SmudgedFingerprints.

翻译：模型指纹检测在法证应用中已显示出追溯AI生成图像来源的潜力。然而，尽管这些应用本质上具有对抗性，现有评估却很少考虑对抗场景。我们首次对这些技术进行了系统性安全评估，形式化了涵盖白盒与黑盒访问的威胁模型及两种攻击目标：指纹消除（通过擦除识别痕迹以逃避溯源）和指纹伪造（旨在导致错误归因于目标模型）。我们实现了五种攻击策略，并在12个最先进的图像生成器上评估了涵盖RGB、频域和学习特征领域的14种代表性指纹方法。实验结果表明，干净环境与对抗环境下的性能存在显著差距。消除攻击效果显著，在白盒设置下成功率常超过80%，在黑盒访问下亦超过50%。虽然伪造攻击比消除更具挑战性，但其成功率因目标模型差异显著。我们还观察到效用与鲁棒性之间的权衡：准确归因的方法往往易受攻击，且尽管某些技术在特定场景下具有鲁棒性，但没有任何方法能在所有评估的威胁模型中同时实现鲁棒性与准确性。这些发现凸显了平衡鲁棒性与准确性的技术需求，并指出了实现该目标最具前景的研究方向。代码发布于：https://github.com/kaikaiyao/SmudgedFingerprints。