Bias detection and mitigation is an active area of research in machine learning. This work extends previous research done by the authors to provide a rigorous and more complete analysis of the bias found in AI predictive models. Admissions data spanning six years was used to create an AI model to determine whether a given student would be directly admitted into the School of Science under various scenarios at a large urban research university. During this time, submission of standardized test scores as part of an application became optional which led to interesting questions about the impact of standardized test scores on admission decisions. We developed and analyzed AI models to understand which variables are important in admissions decisions, and how the decision to exclude test scores affects the demographics of the students who are admitted. We then evaluated the predictive models to detect and analyze biases these models may carry with respect to three variables chosen to represent sensitive populations: gender, race, and whether a student was the first in his or her family to attend college. We also extended our analysis to show that the biases detected were persistent. Finally, we included several fairness metrics in our analysis and discussed the uses and limitations of these metrics.
翻译:偏差检测与缓解是机器学习领域的一个活跃研究方向。本研究扩展了作者先前的工作,对人工智能预测模型中存在的偏差进行了更为严谨和完整的分析。我们利用一所大型城市研究型大学连续六年的招生数据构建了一个人工智能模型,用于预测特定学生在不同情境下是否会被理学院直接录取。在此期间,标准化考试成绩作为申请材料的组成部分变为可选提交,这引发了关于标准化考试成绩对录取决策影响的若干有趣问题。我们开发并分析了多种人工智能模型,以理解哪些变量在录取决策中具有重要性,以及排除考试成绩这一决策如何影响被录取学生的人口统计学特征。随后,我们评估了这些预测模型,检测并分析了它们可能在三个代表敏感群体的变量上存在的偏差:性别、种族以及学生是否为家族中第一代大学生。我们还通过扩展分析证明了所检测到的偏差具有持续性。最后,我们在分析中引入了多种公平性度量指标,并讨论了这些指标的用途与局限性。