Recently, Large Language Models (LLMs) have demonstrated a superior ability to serve as ranking models. However, concerns have arisen as LLMs will exhibit discriminatory ranking behaviors based on users' sensitive attributes (\eg gender). Worse still, in this paper, we identify a subtler form of discrimination in LLMs, termed \textit{implicit ranking unfairness}, where LLMs exhibit discriminatory ranking patterns based solely on non-sensitive user profiles, such as user names. Such implicit unfairness is more widespread but less noticeable, threatening the ethical foundation. To comprehensively explore such unfairness, our analysis will focus on three research aspects: (1) We propose an evaluation method to investigate the severity of implicit ranking unfairness. (2) We uncover the reasons for causing such unfairness. (3) To mitigate such unfairness effectively, we utilize a pair-wise regression method to conduct fair-aware data augmentation for LLM fine-tuning. The experiment demonstrates that our method outperforms existing approaches in ranking fairness, achieving this with only a small reduction in accuracy. Lastly, we emphasize the need for the community to identify and mitigate the implicit unfairness, aiming to avert the potential deterioration in the reinforced human-LLMs ecosystem deterioration.
翻译:近年来,大型语言模型(LLMs)展现出作为排序模型的卓越能力。然而,随着LLMs可能基于用户的敏感属性(如性别)表现出歧视性排序行为,相关担忧日益凸显。更严重的是,本文在LLMs中发现了一种更为隐蔽的歧视形式,称为"隐含排序不公平性",即LLMs仅根据非敏感用户特征(如用户名)即表现出歧视性排序模式。此类隐含不公平性更为普遍却不易察觉,对伦理基础构成威胁。为全面探究该问题,我们的分析将聚焦三个研究维度:(1)提出评估方法以量化隐含排序不公平性的严重程度;(2)揭示导致此类不公平性的成因;(3)为有效缓解该问题,我们采用成对回归方法进行公平感知数据增强以用于LLM微调。实验表明,我们的方法在排序公平性上优于现有方案,且仅伴随准确率的微小下降。最后,我们强调学界亟需识别并缓解隐含不公平性,以防止强化的人类-LLM生态系统可能出现的恶化趋势。