Finding a Fair Scoring Function for Top-$k$ Selection: From Hardness to Practice

Selecting a subset of the $k$ "best" items from a dataset of $n$ items, based on a scoring function, is a key task in decision-making. Given the rise of automated decision-making software, it is important that the outcome of this process, called top-$k$ selection, is fair. Here we consider the problem of identifying a fair linear scoring function for top-$k$ selection. The function computes a score for each item as a weighted sum of its (numerical) attribute values, and must ensure that the selected subset includes adequate representation of a minority or historically disadvantaged group. Existing algorithms do not scale efficiently, particularly in higher dimensions. Our hardness analysis shows that in more than two dimensions, no algorithm is likely to achieve good scalability with respect to dataset size, and the computational complexity is likely to increase rapidly with dimensionality. However, the hardness results also provide key insights guiding algorithm design, leading to our two-pronged solution: (1) For small values of $k$, our hardness analysis reveals a gap in the hardness barrier. By addressing various engineering challenges, including achieving efficient parallelism, we turn this potential of efficiency into an optimized algorithm delivering substantial practical performance gains. (2) For large values of $k$, where the hardness is robust, we employ a practically efficient algorithm which, despite being theoretically worse, achieves superior real-world performance. Experimental evaluations on real-world datasets then explore scenarios where worst-case behavior does not manifest, identifying areas critical to practical performance. Our solution achieves speed-ups of up to several orders of magnitude compared to SOTA, an efficiency made possible through a tight integration of hardness analysis, algorithm design, practical engineering, and empirical evaluation.

翻译：从包含$n$个数据项的数据集中，基于评分函数选取$k$个“最优”项的子集，是决策过程中的核心任务。随着自动化决策软件的兴起，确保此过程（称为Top-$k$选择）的结果具有公平性至关重要。本文研究如何为Top-$k$选择识别公平的线性评分函数。该函数通过加权求和各数据项的（数值型）属性值来计算其得分，且必须确保所选子集中包含足够数量的少数群体或历史弱势群体代表。现有算法扩展性较差，尤其在更高维度下表现尤甚。我们的理论难度分析表明，在超过两维的情况下，任何算法都难以实现与数据集大小相关的良好可扩展性，且计算复杂度会随维度快速增加。然而，难度分析结果也为算法设计提供了关键启示，进而催生了我们的双轨解决方案：(1) 对于较小的$k$值，我们的难度分析揭示了难度壁垒中的突破口。通过应对包括高效并行计算在内的各项工程挑战，我们将此效率潜力转化为经过优化的算法，带来显著的实践性能提升。(2) 对于难度壁垒坚实的较大$k$值，我们采用一种理论上较劣但实际性能优异的实用高效算法。随后，在真实数据集上的实验评估探索了最坏情况不出现的场景，识别出对实践性能至关重要的领域。与最先进方法相比，我们的解决方案实现了高达数个数量级的加速——这一效率得益于难度分析、算法设计、实际工程与实验评估的紧密融合。