AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval

With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon has elevated the issue of source bias in text retrieval for web searches. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.

翻译：随着生成模型技术的进步，AI生成内容（AIGC）日益逼真并充斥互联网。近期研究表明，这一现象加剧了网络搜索文本检索中的来源偏差问题——神经检索模型倾向于将生成文本的排序权重高于人类撰写的文本。本文将此类偏差研究延伸至跨模态检索领域。首先，我们成功构建了适用于探究该偏差的基准测试集。在此基准上的大量实验表明，AI生成图像会为文本-图像检索模型引入隐式相关性偏差。具体而言，实验显示即使AI生成图像与查询的视觉相关特征并不优于真实图像，文本-图像检索模型仍倾向于将其排序置于真实图像之上。这种隐式相关性偏差普遍存在于不同训练数据与架构的检索模型中。进一步探究发现，将AI生成图像纳入检索模型训练数据会加剧该偏差。上述现象引发恶性循环，导致隐式相关性偏差持续恶化。为阐明隐式相关性的潜在成因并解决前述问题，我们提出了一种旨在缓解隐式相关性偏差的有效训练方法。随后，通过将该去偏方法应用于反向追溯偏差成因，揭示了AI生成图像会促使图像编码器在其表征中嵌入额外信息——这些信息在不同语义的生成图像间呈现一致性，并使检索模型估算出更高的相关性分数。