Algorithms are increasingly used to automate or aid human decisions, yet recent research shows that these algorithms may exhibit bias across legally protected demographic groups. However, data on these groups may be unavailable to organizations or external auditors due to privacy legislation. This paper studies bias detection using an unsupervised bias detection tool when data on demographic groups are unavailable. We collaborated with the Dutch Executive Agency for Education to audit an algorithm that was used to assign risk scores to college students at the national level in the Netherlands between 2012-2023. Our audit covers more than 250,000 students across the country. The unsupervised bias detection tool highlights known disparities between students with a non-European migration background and students with a Dutch or European-migration background. Our contributions are two-fold: (1) we assess bias in a real-world, large-scale, and high-stakes decision-making process by a governmental organization; (2) we provide the unsupervised bias detection tool in an open-source library for others to use to complete bias audits. Our work serves as a starting point for a deliberative assessment by human experts to evaluate potential discrimination in algorithmic decision-making.
翻译:算法正日益被用于自动化或辅助人类决策,然而近期研究表明,这些算法可能在法律保护的特定人口群体间表现出偏差。但由于隐私法规限制,机构或外部审计方往往无法获取这些群体数据。本文研究了在缺乏人口群体数据时,如何利用无监督偏差检测工具进行偏差检测。我们与荷兰教育执行局合作,审计了该国2012至2023年间用于为全国大学生分配风险评分的算法。本次审计覆盖全国超过25万名学生。无监督偏差检测工具揭示了具有非欧洲移民背景的学生与具有荷兰或欧洲移民背景的学生之间已知的差异。我们的贡献包括两个方面:(1)对政府机构在现实世界、大规模、高风险决策过程中使用的算法进行偏差评估;(2)将无监督偏差检测工具开源发布,供其他研究者完成偏差审计。本工作为专家开展审议性评估、判断算法决策中潜在歧视问题提供了起点。