With the increasing research attention on fairness in information retrieval systems, more and more fairness-aware algorithms have been proposed to ensure fairness for a sustainable and healthy retrieval ecosystem. However, as the most adopted measurement of fairness-aware algorithms, group fairness evaluation metrics, require group membership information that needs massive human annotations and is barely available for general information retrieval datasets. This data sparsity significantly impedes the development of fairness-aware information retrieval studies. Hence, a practical, scalable, low-cost group membership annotation method is needed to assist or replace human annotations. This study explored how to leverage language models to automatically annotate group membership for group fairness evaluations, focusing on annotation accuracy and its impact. Our experimental results show that BERT-based models outperformed state-of-the-art large language models, including GPT and Mistral, achieving promising annotation accuracy with minimal supervision in recent fair-ranking datasets. Our impact-oriented evaluations reveal that minimal annotation error will not degrade the effectiveness and robustness of group fairness evaluation. The proposed annotation method reduces tremendous human efforts and expands the frontier of fairness-aware studies to more datasets.
翻译:随着信息检索系统公平性研究日益受到关注,越来越多的公平感知算法被提出,以确保构建可持续且健康的检索生态系统。然而,作为当前最广泛采用的公平感知算法评估方式,群体公平性评估指标需要依赖群体成员信息,这类信息通常需要大量人工标注,且在通用信息检索数据集中极为稀缺。这种数据稀疏性严重阻碍了公平感知信息检索研究的发展。因此,亟需一种实用、可扩展、低成本的群体成员标注方法来辅助或替代人工标注。本研究探索了如何利用语言模型自动标注群体成员信息以支持群体公平性评估,重点关注标注准确性及其影响效应。实验结果表明,基于BERT的模型在近期公平排序数据集上,以极少的监督信息实现了优异的标注准确率,其性能超越了包括GPT和Mistral在内的前沿大语言模型。面向影响效应的评估表明,微小的标注误差不会降低群体公平性评估的有效性与鲁棒性。所提出的标注方法大幅减少了人工标注成本,并将公平感知研究的边界拓展至更多数据集。