Statistical measures for group fairness in machine learning reflect the gap in performance of algorithms across different groups. These measures, however, exhibit a high variance between different training instances, which makes them unreliable for empirical evaluation of fairness. What causes this high variance? We investigate the impact on group fairness of different sources of randomness in training neural networks. We show that the variance in group fairness measures is rooted in the high volatility of the learning process on under-represented groups. Further, we recognize the dominant source of randomness as the stochasticity of data order during training. Based on these findings, we show how one can control group-level accuracy (i.e., model fairness), with high efficiency and negligible impact on the model's overall performance, by simply changing the data order for a single epoch.
翻译:用于衡量机器学习中群体公平性的统计指标反映了算法在不同群体间性能的差距。然而,这些指标在不同训练实例间表现出高方差,使其在公平性的实证评估中不可靠。造成这种高方差的原因是什么?我们研究了训练神经网络时不同随机性来源对群体公平性的影响。研究表明,群体公平性指标的方差根源于学习过程在代表性不足群体上的高波动性。此外,我们识别出训练过程中数据顺序的随机性为主导随机性来源。基于这些发现,我们证明只需通过改变单个训练周期的数据顺序,即可高效控制群体层面的准确性(即模型公平性),且对模型整体性能的影响微乎其微。