The error of an estimator can be decomposed into a (statistical) bias term, a variance term, and an irreducible noise term. When we do bias analysis, formally we are asking the question: "how good are the predictions?" The role of bias in the error decomposition is clear: if we trust the labels/targets, then we would want the estimator to have as low bias as possible, in order to minimize error. Fair machine learning is concerned with the question: "Are the predictions equally good for different demographic/social groups?" This has naturally led to a variety of fairness metrics that compare some measure of statistical bias on subsets corresponding to socially privileged and socially disadvantaged groups. In this paper we propose a new family of performance measures based on group-wise parity in variance. We demonstrate when group-wise statistical bias analysis gives an incomplete picture, and what group-wise variance analysis can tell us in settings that differ in the magnitude of statistical bias. We develop and release an open-source library that reconciles uncertainty quantification techniques with fairness analysis, and use it to conduct an extensive empirical analysis of our variance-based fairness measures on standard benchmarks.
翻译:一个估计量的误差可分解为(统计)偏差项、方差项和不可约噪声项。当我们进行偏差分析时,实际上是在问:“预测结果有多好?”在误差分解中,偏差的作用是明确的:如果我们信任标签/目标,那么我们希望估计量的偏差尽可能低,以最小化误差。公平机器学习关注的问题是:“预测结果对社会人口不同群体是否同样良好?”这自然导致了多种公平性指标,这些指标比较社会特权群体和社会弱势群体子集上的统计偏差度量。本文提出一类基于组间方差一致性的新性能度量。我们展示何时组间统计偏差分析会给出不完整的图景,以及在统计偏差幅度不同的情境中,组间方差分析能揭示什么信息。我们开发并发布了一个将不确定性量化技术与公平性分析相结合的开源库,并用它在标准基准上对我们基于方差的公平性度量进行了广泛的实证分析。