Aggregating multiple input rankings into a consensus ranking is essential in various fields such as social choice theory, hiring, college admissions, web search, and databases. A major challenge is that the optimal consensus ranking might be biased against individual candidates or groups, especially those from marginalized communities. This concern has led to recent studies focusing on fairness in rank aggregation. The goal is to ensure that candidates from different groups are fairly represented in the top-$k$ positions of the aggregated ranking. We study this fair rank aggregation problem by considering the Kendall tau as the underlying metric. While we know of a polynomial-time approximation scheme (PTAS) for the classical rank aggregation problem, the corresponding fair variant only possesses a quite straightforward 3-approximation algorithm due to Wei et al., SIGMOD'22, and Chakraborty et al., NeurIPS'22, which finds closest fair ranking for each input ranking and then simply outputs the best one. In this paper, we first provide a novel algorithm that achieves $(2+ε)$-approximation (for any $ε> 0$), significantly improving over the 3-approximation bound. Next, we provide a $2.881$-approximation fair rank aggregation algorithm that works irrespective of the fairness notion, given one can find a closest fair ranking, beating the 3-approximation bound. We complement our theoretical guarantee by performing extensive experiments on various real-world datasets to establish the effectiveness of our algorithm further by comparing it with the performance of state-of-the-art algorithms.
翻译:将多个输入排序聚合成一个共识排序在社会选择理论、招聘、大学录取、网络搜索和数据库等多个领域至关重要。一个主要挑战在于,最优共识排序可能对个别候选人或群体(特别是来自边缘化社区的群体)存在偏见。这一关切引发了近期关于排序聚合公平性的研究,其目标是确保不同群体的候选人在聚合排序的前$k$个位置中得到公平体现。我们通过将肯德尔τ距离作为基础度量来研究这一公平排序聚合问题。尽管已知经典排序聚合问题存在多项式时间近似方案(PTAS),但相应的公平变体仅具有Wei等人(SIGMOD'22)和Chakraborty等人(NeurIPS'22)提出的相当直接的3-近似算法——该算法为每个输入排序寻找最近的公平排序,然后直接输出最优者。本文首先提出了一种实现$(2+ε)$-近似(对于任意$ε> 0$)的新算法,显著改进了3-近似界限。其次,我们提出了一种与公平性定义无关的2.881-近似公平排序聚合算法(前提是能够找到最近的公平排序),突破了3-近似界限。我们通过在多个真实数据集上进行大量实验,将算法性能与最先进算法进行比较,进一步验证了所提算法的有效性,从而补充了理论保证。