We propose a method for certifying the fairness of the classification result of a widely used supervised learning algorithm, the k-nearest neighbors (KNN), under the assumption that the training data may have historical bias caused by systematic mislabeling of samples from a protected minority group. To the best of our knowledge, this is the first certification method for KNN based on three variants of the fairness definition: individual fairness, $\epsilon$-fairness, and label-flipping fairness. We first define the fairness certification problem for KNN and then propose sound approximations of the complex arithmetic computations used in the state-of-the-art KNN algorithm. This is meant to lift the computation results from the concrete domain to an abstract domain, to reduce the computational cost. We show effectiveness of this abstract interpretation based technique through experimental evaluation on six datasets widely used in the fairness research literature. We also show that the method is accurate enough to obtain fairness certifications for a large number of test inputs, despite the presence of historical bias in the datasets.
翻译:我们提出了一种方法,用于在训练数据可能因系统性错误标记来自受保护少数群体的样本而产生历史偏见的假设下,认证广泛使用的监督学习算法——k近邻(KNN)分类结果的公平性。据我们所知,这是首个基于三种公平性定义变体的KNN认证方法:个体公平性、ε-公平性及标签翻转公平性。我们首先定义了KNN的公平性认证问题,随后针对当前最先进KNN算法中使用的复杂算术计算提出了合理的近似方法。其目的是将计算结果从具体域提升至抽象域,从而降低计算成本。通过在公平性研究文献中广泛使用的六个数据集上的实验评估,我们展示了这种基于抽象解释技术的有效性。同时,研究结果表明,尽管数据集中存在历史偏见,该方法仍能足够准确地为大量测试输入获得公平性认证。