Recent advancements in artificial intelligence have enabled generative models to produce synthetic scientific images that are indistinguishable from pristine ones, posing a challenge even for expert scientists habituated to working with such content. When exploited by organizations known as paper mills, which systematically generate fraudulent articles, these technologies can significantly contribute to the spread of misinformation about ungrounded science, potentially undermining trust in scientific research. While previous studies have explored black-box solutions, such as Convolutional Neural Networks, for identifying synthetic content, only some have addressed the challenge of generalizing across different models and providing insight into the artifacts in synthetic images that inform the detection process. This study aims to identify explainable artifacts generated by state-of-the-art generative models (e.g., Generative Adversarial Networks and Diffusion Models) and leverage them for open-set identification and source attribution (i.e., pointing to the model that created the image).
翻译:人工智能的最新进展使得生成模型能够产生与原始科学图像难以区分的合成图像,这对习惯于处理此类内容的专家科学家也构成了挑战。当被专门系统生成欺诈性文章的"论文工厂"组织利用时,这些技术可能显著助长关于无根据科学信息的传播,潜在地破坏对科学研究的信任。虽然先前研究已探索了使用卷积神经网络等黑盒解决方案来识别合成内容,但仅有少数研究解决了跨模型泛化的挑战,并为检测过程提供合成图像中信息性伪影的深入解析。本研究旨在识别由最先进生成模型(如生成对抗网络和扩散模型)产生的可解释性伪影,并利用这些伪影进行开放集识别与来源溯源(即追溯生成图像的特定模型)。