AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval

With the advancement of generation models, AI-generated content (AIGC) is becoming more realistic, flooding the Internet. A recent study suggests that this phenomenon has elevated the issue of source bias in text retrieval for web searches. Specifically, neural retrieval models tend to rank generated texts higher than human-written texts. In this paper, we extend the study of this bias to cross-modal retrieval. Firstly, we successfully construct a suitable benchmark to explore the existence of the bias. Subsequent extensive experiments on this benchmark reveal that AI-generated images introduce an invisible relevance bias to text-image retrieval models. Specifically, our experiments show that text-image retrieval models tend to rank the AI-generated images higher than the real images, even though the AI-generated images do not exhibit more visually relevant features to the query than real images. This invisible relevance bias is prevalent across retrieval models with varying training data and architectures. Furthermore, our subsequent exploration reveals that the inclusion of AI-generated images in the training data of the retrieval models exacerbates the invisible relevance bias. The above phenomenon triggers a vicious cycle, which makes the invisible relevance bias become more and more serious. To elucidate the potential causes of invisible relevance and address the aforementioned issues, we introduce an effective training method aimed at alleviating the invisible relevance bias. Subsequently, we apply our proposed debiasing method to retroactively identify the causes of invisible relevance, revealing that the AI-generated images induce the image encoder to embed additional information into their representation. This information exhibits a certain consistency across generated images with different semantics and can make the retriever estimate a higher relevance score.

翻译：随着生成模型的发展，AI生成内容（AIGC）日益逼真并充斥互联网。近期研究表明，这一现象加剧了网络搜索文本检索中的来源偏差问题——神经检索模型倾向于将生成文本的排名置于人类撰写的文本之上。本文将此类偏差研究扩展至跨模态检索领域。首先，我们成功构建了适用于探究偏差存在性的基准数据集。基于该基准的广泛实验表明，AI生成图像为文本-图像检索模型引入了隐性相关性偏差：具体而言，尽管AI生成图像相比真实图像未表现出与查询更显著的视觉相关特征，但文本-图像检索模型仍倾向于优先排列AI生成图像。该隐性相关性偏差普遍存在于不同训练数据与架构的检索模型中。后续探索进一步揭示，将AI生成图像纳入检索模型训练数据会加剧此偏差。上述现象引发恶性循环，使隐性相关性偏差持续恶化。为阐明隐性相关性的潜在成因并解决前述问题，我们引入旨在缓解隐性相关性偏差的有效训练方法。通过应用所提出的去偏方法反向追溯偏差成因，发现AI生成图像会诱导图像编码器在其表征中嵌入附加信息，这些信息在不同语义的生成图像间呈现一致性，并可导致检索器估算出更高的相关性评分。