Machine learning models must balance accuracy and fairness, but these goals often conflict, particularly when data come from multiple demographic groups. A useful tool for understanding this trade-off is the fairness-accuracy (FA) frontier, which characterizes the set of models that cannot be simultaneously improved in both fairness and accuracy. Prior analyses of the FA frontier provide a full characterization under the assumption of complete knowledge of population distributions -- an unrealistic ideal. We study the FA frontier in the finite-sample regime, showing how it deviates from its population counterpart and quantifying the worst-case gap between them. In particular, we derive minimax-optimal estimators that depend on the designer's knowledge of the covariate distribution. For each estimator, we characterize how finite-sample effects asymmetrically impact each group's risk, and identify optimal sample allocation strategies. Our results transform the FA frontier from a theoretical construct into a practical tool for policymakers and practitioners who must often design algorithms with limited data.
翻译:机器学习模型必须在准确性与公平性之间取得平衡,但这两个目标常常相互冲突,尤其是在数据来自多个人口统计群体时。理解这种权衡关系的一个有效工具是公平性-准确性(FA)前沿,它刻画了那些无法同时在公平性和准确性上得到改进的模型集合。先前对FA前沿的分析基于完全掌握总体分布这一假设提供了完整的理论刻画——这是一种不切实际的理想状态。我们研究了有限样本条件下的FA前沿,揭示了其与总体对应前沿的偏离程度,并量化了二者之间的最坏情况差距。具体而言,我们推导了极小化极大最优估计量,其形式取决于设计者对协变量分布的认知程度。针对每个估计量,我们刻画了有限样本效应如何非对称地影响各群体的风险,并确定了最优样本分配策略。我们的研究将FA前沿从理论建构转化为政策制定者和实践者的实用工具——他们往往需要在有限数据条件下设计算法。