As acquiring reliable ground-truth labels is usually costly, or infeasible, crowdsourcing and aggregation of noisy human annotations is the typical resort. Aggregating subjective labels, though, may amplify individual biases, particularly regarding sensitive features, raising fairness concerns. Nonetheless, fairness in crowdsourced aggregation remains largely unexplored, with no existing convergence guarantees and only limited post-processing approaches for enforcing $\varepsilon$-fairness under demographic parity. We address this gap by analyzing the fairness s of crowdsourced aggregation methods within the $\varepsilon$-fairness framework, for Majority Vote and Optimal Bayesian aggregation. In the small-crowd regime, we derive an upper bound on the fairness gap of Majority Vote in terms of the fairness gaps of the individual annotators. We further show that the fairness gap of the aggregated consensus converges exponentially fast to that of the ground-truth under interpretable conditions. Since ground-truth itself may still be unfair, we generalize a state-of-the-art multiclass fairness post-processing algorithm from the continuous to the discrete setting, which enforces strict demographic parity constraints to any aggregation rule. Experiments on synthetic and real datasets demonstrate the effectiveness of our approach and corroborate the theoretical insights.
翻译:由于获取可靠的基准真值标签通常成本高昂或不可行,众包和聚合带噪声的人工标注成为典型解决方案。然而,聚合主观标签可能放大个体偏见,特别是在敏感特征方面,从而引发公平性担忧。尽管如此,众包聚合中的公平性问题在很大程度上仍未得到探索,目前既缺乏收敛性保证,也仅有有限的后处理方法能在人口统计奇偶性下强制执行$\varepsilon$-公平性。我们通过分析多数投票和最优贝叶斯聚合方法在$\varepsilon$-公平性框架下的公平性来填补这一空白。在小规模众包场景中,我们推导出多数投票公平性差距的上界,该上界由个体标注者的公平性差距表示。我们进一步证明,在可解释的条件下,聚合共识的公平性差距会以指数级速度收敛到基准真值的公平性差距。由于基准真值本身可能仍存在不公平性,我们将最先进的多类公平性后处理算法从连续设置推广到离散设置,该算法能对任何聚合规则强制执行严格的人口统计奇偶约束。在合成和真实数据集上的实验证明了我们方法的有效性,并验证了理论见解。