In this paper, we propose an innovative approach to thoroughly explore dataset features that introduce bias in downstream machine-learning tasks. Depending on the data format, we use different techniques to map instances into a similarity feature space. Our method's ability to adjust the resolution of pairwise similarity provides clear insights into the relationship between the dataset classification complexity and model fairness. Experimental results confirm the promising applicability of the similarity network in promoting fair models. Moreover, leveraging our methodology not only seems promising in providing a fair downstream task such as classification, it also performs well in imputation and augmentation of the dataset satisfying the fairness criteria such as demographic parity and imbalanced classes.
翻译:本文提出了一种创新方法,用于深入探究在下游机器学习任务中引入偏差的数据集特征。根据数据格式的差异,我们采用不同技术将实例映射至相似性特征空间。该方法通过调节成对相似性分辨率的能力,清晰揭示了数据集分类复杂度与模型公平性之间的关联。实验结果表明,相似性网络在促进公平模型方面具有显著的应用潜力。此外,该方法不仅能为分类等下游任务提供公平性保障,在满足人口均等与类别平衡等公平性准则的数据集插补与增强任务中也表现出优越性能。