Text-to-image (T2I) models are increasingly popular, producing a large share of AI-generated images online. To compare model quality, voting-based leaderboards have become the standard, relying on anonymized model outputs for fairness. In this work, we show that such anonymity can be easily broken. We find that generations from each T2I model form distinctive clusters in the image embedding space, enabling accurate deanonymization without prompt control or training data. Using 22 models and 280 prompts (150K images), our centroid-based method achieves high accuracy and reveals systematic model-specific signatures. We further introduce a prompt-level distinguishability metric and conduct large-scale analyses showing how certain prompts can lead to near-perfect distinguishability. Our findings expose fundamental security flaws in T2I leaderboards and motivate stronger anonymization defenses.
翻译:文本到图像(T2I)模型日益普及,在线生成的人工智能图像中很大一部分由其产生。为比较模型质量,基于投票的排行榜已成为标准,其依赖于匿名化的模型输出以确保公平性。本研究中,我们证明此类匿名性可被轻易破解。我们发现每个T2I模型的生成结果在图像嵌入空间中形成独特的聚类,从而无需提示控制或训练数据即可实现精准的去匿名化。通过使用22个模型和280个提示(共15万张图像),我们基于质心的方法实现了高准确率,并揭示了系统性的模型特定特征。我们进一步提出了提示级别的可区分性度量,并进行了大规模分析,表明某些提示可导致近乎完美的可区分性。我们的研究结果揭示了T2I排行榜中根本性的安全缺陷,并推动了更强匿名化防御机制的发展。