As algorithmic decision-making systems become more prevalent in society, ensuring the fairness of these systems is becoming increasingly important. Whilst there has been substantial research in building fair algorithmic decision-making systems, the majority of these methods require access to the training data, including personal characteristics, and are not transparent regarding which individuals are classified unfairly. In this paper, we propose a novel model-agnostic argumentation-based method to determine why an individual is classified differently in comparison to similar individuals. Our method uses a quantitative argumentation framework to represent attribute-value pairs of an individual and of those similar to them, and uses a well-known semantics to identify the attribute-value pairs in the individual contributing most to their different classification. We evaluate our method on two datasets commonly used in the fairness literature and illustrate its effectiveness in the identification of bias.
翻译:随着算法决策系统在社会中日益普及,确保这些系统的公平性变得愈发重要。尽管构建公平算法决策系统的研究已取得大量成果,但大多数方法需要访问训练数据(包括个人特征),且对于哪些个体被不公平分类不够透明。本文提出一种新颖的模型无关的基于论证方法,用于确定个体为何与相似个体相比被不同分类。该方法采用量化论证框架来表征个体及其相似个体的属性-值对,并利用知名语义识别对分类差异贡献最大的属性-值对。我们在公平性文献中常用的两个数据集上评估了该方法,并展示了其在识别偏见方面的有效性。