Predictive parity (PP), also known as sufficiency, is a core definition of algorithmic fairness essentially stating that model outputs must have the same interpretation of expected outcomes regardless of group. Testing and satisfying PP is especially important in many settings where model scores are interpreted by humans or directly provide access to opportunity, such as healthcare or banking. Solutions for PP violations have primarily been studied through the lens of model calibration. However, we find that existing calibration-based tests and mitigation methods are designed for independent data, which is often not assumable in large-scale applications such as social media or medical testing. In this work, we address this issue by developing a statistically rigorous non-parametric regression based test for PP with dependent observations. We then apply our test to illustrate that PP testing can significantly vary under the two assumptions. Lastly, we provide a mitigation solution to provide a minimally-biased post-processing transformation function to achieve PP.
翻译:预测奇偶性,也称为充分性,是算法公平性的核心定义,其本质要求模型输出对预期结果的解释在不同群体间保持一致。在医疗或金融等众多由人类解读模型评分、或评分直接决定机会获取的场景中,检验并满足预测奇偶性尤为重要。针对预测奇偶性违反的解决方案主要从模型校准角度展开研究。然而我们发现,现有基于校准的检验与缓解方法专为独立数据设计,而在社交媒体或医学检测等大规模应用中,独立性假设往往不成立。本研究通过开发一种具有统计严谨性的非参数回归检验方法来解决该问题,该方法适用于存在依赖观测的预测奇偶性检验。我们随后应用该检验证明,在独立性与依赖性两种假设下,预测奇偶性检验结果可能存在显著差异。最后,我们提出一种缓解方案,通过后处理变换函数实现最小化偏差的预测奇偶性。