With the rapid advancements of large language models (LLMs), information retrieval (IR) systems, such as search engines and recommender systems, have undergone a significant paradigm shift. This evolution, while heralding new opportunities, introduces emerging challenges, particularly in terms of biases and unfairness, which may threaten the information ecosystem. In this paper, we present a comprehensive survey of existing works on emerging and pressing bias and unfairness issues in IR systems when the integration of LLMs. We first unify bias and unfairness issues as distribution mismatch problems, providing a groundwork for categorizing various mitigation strategies through distribution alignment. Subsequently, we systematically delve into the specific bias and unfairness issues arising from three critical stages of LLMs integration into IR systems: data collection, model development, and result evaluation. In doing so, we meticulously review and analyze recent literature, focusing on the definitions, characteristics, and corresponding mitigation strategies associated with these issues. Finally, we identify and highlight some open problems and challenges for future work, aiming to inspire researchers and stakeholders in the IR field and beyond to better understand and mitigate bias and unfairness issues of IR in this LLM era. We also consistently maintain a GitHub repository for the relevant papers and resources in this rising direction at https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey.
翻译:随着大语言模型(LLMs)的快速发展,信息检索(IR)系统(如搜索引擎和推荐系统)经历了显著的范式转变。这一演变在带来新机遇的同时,也引入了新兴挑战,特别是在偏见和不公平性方面,这些挑战可能威胁信息生态系统。本文全面综述了在集成LLMs时,IR系统中新兴且紧迫的偏见与不公平性问题方面的现有工作。我们首先将偏见与不公平性问题统一为分布失配问题,为通过分布对齐对各种缓解策略进行分类奠定了基础。随后,我们系统性地深入探讨了将LLMs集成到IR系统的三个关键阶段(数据收集、模型开发和结果评估)所产生的具体偏见与不公平性问题。在此过程中,我们细致地回顾和分析了近期文献,重点关注与这些问题相关的定义、特征及相应的缓解策略。最后,我们指出并强调了一些未来工作的开放性问题与挑战,旨在启发IR领域内外的研究人员和利益相关者更好地理解和缓解这一LLM时代IR的偏见与不公平性问题。我们还在https://github.com/KID-22/LLM-IR-Bias-Fairness-Survey 上持续维护一个关于这一新兴方向的相关论文与资源的GitHub仓库。