The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These biases in text embedding based IR systems can inadvertently silence or marginalize certain communication styles, thereby posing a significant threat to fairness in information retrieval. Finally, we also compare the answer styles of Retrieval Augmented Generation (RAG) systems based on different LLMs and find out that most text embedding models are biased towards LLM's answer styles when used as evaluation metrics for answer correctness. This study sheds light on the critical issue of writing style based bias in IR systems, offering valuable insights for the development of more fair and robust models.
翻译:语言模型技术的快速发展带来了新的机遇,同时也引入了与偏见和公平性相关的新挑战。本文探索了信息检索(IR)系统中,最先进的通用文本嵌入模型对特定文档和查询写作风格的潜在偏见这一未知领域。我们的调查显示,不同的嵌入模型对文档写作风格表现出不同的偏好,而更非正式和情感化的风格则较少受到大多数嵌入模型的青睐。在查询写作风格方面,许多嵌入模型倾向于将查询的风格与检索到的文档的风格相匹配,但有些模型对特定风格表现出持续的偏好。基于LLM生成的合成数据微调的文本嵌入模型,对生成的特定风格数据表现出持续的偏好。这些基于文本嵌入的信息检索系统中的偏见,可能会无意中压制或边缘化某些沟通风格,从而对信息检索的公平性构成重大威胁。最后,我们还比较了基于不同LLM的检索增强生成(RAG)系统的答案风格,并发现当大多数文本嵌入模型用作答案正确性的评估指标时,它们对LLM的答案风格存在偏见。这项研究揭示了信息检索系统中基于写作风格的偏见这一关键问题,为开发更公平、更稳健的模型提供了宝贵的见解。