In fair machine learning, one source of performance disparities between groups is over-fitting to groups with relatively few training samples. We derive group-specific bounds on the generalization error of welfare-centric fair machine learning that benefit from the larger sample size of the majority group. We do this by considering group-specific Rademacher averages over a restricted hypothesis class, which contains the family of models likely to perform well with respect to a fair learning objective (e.g., a power-mean). Our simulations demonstrate these bounds improve over a naive method, as expected by theory, with particularly significant improvement for smaller group sizes.
翻译:在公平机器学习中,群体间性能差异的一个来源是对训练样本较少的群体存在过拟合。我们推导了以福利为中心的公平机器学习中群体特定的泛化误差界限,该界限受益于多数群体更大的样本量。我们通过考虑受限假设类上的群体特定Rademacher平均值来实现这一点,该假设类包含可能相对于公平学习目标(例如,幂均值)表现良好的模型族。我们的模拟实验表明,这些界限优于朴素方法,正如理论预期,对于规模较小的群体尤其具有显著改进。