To measure bias, we encourage teams to consider using AUC Gap: the absolute difference between the highest and lowest test AUC for subgroups (e.g., gender, race, SES, prior knowledge). It is agnostic to the AI/ML algorithm used and it captures the disparity in model performance for any number of subgroups, which enables non-binary fairness assessments such as for intersectional identity groups. The teams use a wide range of AI/ML models in pursuit of a common goal of doubling math achievement in low-income middle schools. Ensuring that the models, which are trained on datasets collected in many different contexts, do not introduce or amplify biases is important for achieving the goal. We offer here a versatile and easy-to-compute measure of model bias for all the teams in order to create a common benchmark and an analytical basis for sharing what strategies have worked for different teams.
翻译:为衡量偏差,我们建议团队考虑使用AUC差距:即子群体(如性别、种族、社会经济地位、先前知识)的最高与最低测试AUC之间的绝对差值。该指标与所使用的人工智能/机器学习算法无关,能捕捉任意数量子群体在模型性能上的差异,从而支持非二元公平性评估,例如针对交叉身份群体的评估。各团队为实现低收入中学数学成绩翻倍的共同目标,广泛采用各类AI/ML模型。确保这些基于多情境数据集训练的模型不引入或放大偏差,对达成目标至关重要。我们在此为所有团队提供一种通用且易于计算的模型偏差度量方法,以建立共同基准和分析基础,促进跨团队成功策略的共享。