Hidden or Inferred: Fair Learning-To-Rank with Unknown Demographics

As learning-to-rank models are increasingly deployed for decision-making in areas with profound life implications, the FairML community has been developing fair learning-to-rank (LTR) models. These models rely on the availability of sensitive demographic features such as race or sex. However, in practice, regulatory obstacles and privacy concerns protect this data from collection and use. As a result, practitioners may either need to promote fairness despite the absence of these features or turn to demographic inference tools to attempt to infer them. Given that these tools are fallible, this paper aims to further understand how errors in demographic inference impact the fairness performance of popular fair LTR strategies. In which cases would it be better to keep such demographic attributes hidden from models versus infer them? We examine a spectrum of fair LTR strategies ranging from fair LTR with and without demographic features hidden versus inferred to fairness-unaware LTR followed by fair re-ranking. We conduct a controlled empirical investigation modeling different levels of inference errors by systematically perturbing the inferred sensitive attribute. We also perform three case studies with real-world datasets and popular open-source inference methods. Our findings reveal that as inference noise grows, LTR-based methods that incorporate fairness considerations into the learning process may increase bias. In contrast, fair re-ranking strategies are more robust to inference errors. All source code, data, and experimental artifacts of our experimental study are available here: https://github.com/sewen007/hoiltr.git

翻译：随着学习排序模型在具有深远生活影响的决策领域日益广泛应用，公平机器学习社区持续开发公平学习排序模型。这些模型依赖于种族或性别等敏感人口统计特征的可用性。然而在实践中，监管障碍和隐私保护使得此类数据难以收集和使用。因此，实践者可能需要在缺乏这些特征的情况下促进公平性，或转向人口统计推断工具尝试推测这些特征。鉴于这些工具存在误差，本文旨在进一步理解人口统计推断中的错误如何影响主流公平LTR策略的公平性表现。在何种情况下，将此类人口统计特征对模型隐藏比进行推断更为可取？我们研究了一系列公平LTR策略，包括使用隐藏或推断的人口统计特征的公平LTR、无人口统计特征的公平LTR，以及公平性无感知LTR后接公平重排序策略。我们通过系统扰动推断的敏感属性来模拟不同程度的推断误差，进行了受控实证研究。同时利用真实世界数据集和主流开源推断方法开展了三项案例研究。研究结果表明，随着推断噪声的增加，将公平性考量融入学习过程的LTR方法可能加剧偏差；相比之下，公平重排序策略对推断误差具有更强的鲁棒性。本实验研究的所有源代码、数据和实验材料均公开于此：https://github.com/sewen007/hoiltr.git