The operationalization of algorithmic fairness comes with several practical challenges, not the least of which is the availability or reliability of protected attributes in datasets. In real-world contexts, practical and legal impediments may prevent the collection and use of demographic data, making it difficult to ensure algorithmic fairness. While initial fairness algorithms did not consider these limitations, recent proposals aim to achieve algorithmic fairness in classification by incorporating noisiness in protected attributes or not using protected attributes at all. To the best of our knowledge, this is the first head-to-head study of fair classification algorithms to compare attribute-reliant, noise-tolerant and attribute-blind algorithms along the dual axes of predictivity and fairness. We evaluated these algorithms via case studies on four real-world datasets and synthetic perturbations. Our study reveals that attribute-blind and noise-tolerant fair classifiers can potentially achieve similar level of performance as attribute-reliant algorithms, even when protected attributes are noisy. However, implementing them in practice requires careful nuance. Our study provides insights into the practical implications of using fair classification algorithms in scenarios where protected attributes are noisy or partially available.
翻译:算法公平性的实践面临若干挑战,其中数据集中的敏感属性可用性或可靠性问题尤为突出。在现实场景中,实际和法律障碍可能阻碍人口统计数据的收集与使用,使得确保算法公平性变得困难。虽然早期公平性算法未考虑这些限制,但近期研究旨在通过处理敏感属性中的噪声或完全避免使用敏感属性,来实现分类中的算法公平性。据我们所知,这是首个针对公平分类算法的系统性对比研究,从预测性和公平性双维度比较了依赖敏感属性、容忍噪声和回避敏感属性的三类算法。我们通过四个真实数据集和合成扰动的案例研究评估了这些算法。研究表明,即使敏感属性存在噪声,回避敏感属性和容忍噪声的公平分类器也能达到与依赖敏感属性算法相似的水平。然而,实际部署时需要审慎考量细微差异。本研究为在敏感属性存在噪声或部分可获取的场景下使用公平分类算法提供了实践启示。