AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval

With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon causes source bias in text retrieval for web search. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.

翻译：随着生成模型的发展，AI生成内容（AIGC）愈发逼真，充斥互联网。近期研究表明，这一现象导致网络搜索文本检索中出现来源偏差——神经检索模型倾向于将生成文本排位高于人工撰写的文本。本文将此类偏差研究拓展至跨模态检索领域。首先，我们成功构建了适宜基准以探究该偏差的存在性，随后基于该基准开展的大量实验证实，AI生成图像会给文本-图像检索模型引入隐形相关性偏差。具体而言，实验显示即使AI生成图像与查询的视觉相关特征不优于真实图像，文本-图像检索模型仍倾向于将其排位高于真实图像。这种隐形相关性偏差普遍存在于不同训练数据和架构的检索模型中。进一步探索发现，将AI生成图像纳入检索模型训练数据会加剧该偏差。上述现象引发恶性循环，导致隐形相关性偏差日益严重。为阐明隐形相关性的潜在成因并解决前述问题，我们提出了一种旨在缓解隐形相关性偏差的有效训练方法。随后应用所提去偏方法逆向追溯偏差成因，揭示AI生成图像会诱导图像编码器在其表征中嵌入额外信息。这些信息在不同语义的生成图像间呈现一致性，可使检索器评估出更高的相关性分数。