As algorithmic decision-making systems become more prevalent in society, ensuring the fairness of these systems is becoming increasingly important. Whilst there has been substantial research in building fair algorithmic decision-making systems, the majority of these methods require access to the training data, including personal characteristics, and are not transparent regarding which individuals are classified unfairly. In this paper, we propose a novel model-agnostic argumentation-based method to determine why an individual is classified differently in comparison to similar individuals. Our method uses a quantitative argumentation framework to represent attribute-value pairs of an individual and of those similar to them, and uses a well-known semantics to identify the attribute-value pairs in the individual contributing most to their different classification. We evaluate our method on two datasets commonly used in the fairness literature and illustrate its effectiveness in the identification of bias.
翻译:随着算法决策系统在社会中日益普及,确保这些系统的公平性正变得愈发重要。尽管已有大量研究致力于构建公平的算法决策系统,但多数方法需要访问包含个人特征的训练数据,且无法透明地揭示哪些个体受到不公平分类。本文提出一种新颖的模型无关的论证方法,用于判定个体为何会与相似群体受到差异分类。该方法采用定量论证框架来表征个体及其相似群体的属性-值对,并利用经典语义规则识别导致个体被差异化分类的核心属性-值对。我们在公平性研究中常用的两个数据集上评估了该方法,验证了其在偏差识别方面的有效性。