We propose a test of fairness in score-based ranking systems called matched pair calibration. Our approach constructs a set of matched item pairs with minimal confounding differences between subgroups before computing an appropriate measure of ranking error over the set. The matching step ensures that we compare subgroup outcomes between identically scored items so that measured performance differences directly imply unfairness in subgroup-level exposures. We show how our approach generalizes the fairness intuitions of calibration from a binary classification setting to ranking and connect our approach to other proposals for ranking fairness measures. Moreover, our strategy shows how the logic of marginal outcome tests extends to cases where the analyst has access to model scores. Lastly, we provide an example of applying matched pair calibration to a real-word ranking data set to demonstrate its efficacy in detecting ranking bias.
翻译:我们提出了一种名为“匹配配对校准”的公平性检验方法,用于基于分数的排序系统。该方法首先构建一组匹配的项目对,最大程度减少子组之间在混杂因素上的差异,再基于该配对集计算适当的排序误差度量。匹配步骤确保我们能够比较得分相同的项目在不同子组间的结果,从而使观测到的性能差异直接暗示子组暴露程度存在不公平性。我们展示了该方法如何将二元分类场景中的校准公平性直觉推广至排序任务,并将其与其他排序公平性度量方案进行关联。此外,我们的策略揭示了边际结果检验的逻辑如何扩展到分析者可访问模型评分的情形。最后,我们以实际排序数据集为例,演示了匹配配对校准在检测排序偏差中的有效性。