We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.
翻译:我们提出一种基于评分排序系统的公平性检验方法——匹配对校准。该方法通过构建子组间混杂差异最小的匹配项目对集,再计算该集合上适当的排序误差指标。匹配步骤确保我们比较具有相同评分项目在子组间的结果差异,从而使测量到的性能差异直接反映子组层级暴露度中的不公平性。我们展示了该方法如何将校准的公平性直觉从二元分类场景推广至排序场景,并与其他排序公平性度量方案建立联系。此外,我们的策略揭示了边际结果检验的逻辑如何扩展至分析者拥有模型评分的情况。最后,通过将匹配对校准应用于真实排序数据集,我们证明了该方法在检测排序偏差方面的有效性。