Modern machine learning increasingly supports paradigms that are multi-institutional (using data from multiple institutions during training) or cross-institutional (using models from multiple institutions for inference), but the empirical effects of these paradigms are not well understood. This study investigates cross-institutional learning via an empirical case study in higher education. We propose a framework and metrics for assessing the utility and fairness of student dropout prediction models that are transferred across institutions. We examine the feasibility of cross-institutional transfer under real-world data- and model-sharing constraints, quantifying model biases for intersectional student identities, characterizing potential disparate impact due to these biases, and investigating the impact of various cross-institutional ensembling approaches on fairness and overall model performance. We perform this analysis on data representing over 200,000 enrolled students annually from four universities without sharing training data between institutions. We find that a simple zero-shot cross-institutional transfer procedure can achieve similar performance to locally-trained models for all institutions in our study, without sacrificing model fairness. We also find that stacked ensembling provides no additional benefits to overall performance or fairness compared to either a local model or the zero-shot transfer procedure we tested. We find no evidence of a fairness-accuracy tradeoff across dozens of models and transfer schemes evaluated. Our auditing procedure also highlights the importance of intersectional fairness analysis, revealing performance disparities at the intersection of sensitive identity groups that are concealed under one-dimensional analysis.
翻译:现代机器学习日益支持多机构(训练时使用多个机构的数据)或跨机构(推理时使用多个机构的模型)范式,但这些范式的实证效果尚未得到充分理解。本研究通过高等教育领域的实证案例,探讨跨机构学习。我们提出了一套评估跨机构学生辍学预测模型效用与公平性的框架和指标。在现实世界的数据与模型共享约束下,我们检验了跨机构迁移的可行性,量化了针对交叉学生身份的模型偏差,刻画了这些偏差可能导致的差异影响,并探究了多种跨机构集成方法对公平性及模型整体性能的影响。我们在来自四所大学、每年超过20万名注册学生的数据上进行了分析,且未在机构间共享训练数据。研究发现,在我们的研究中,简单的零样本跨机构迁移程序能够达到与本地训练模型相似的性能,且未牺牲模型公平性。同时,我们发现,与本地模型或测试的零样本迁移程序相比,堆叠集成并未在整体性能或公平性方面带来额外优势。在评估的数十个模型和迁移方案中,我们未发现公平性-准确性权衡的证据。我们的审计程序还凸显了交叉公平性分析的重要性,揭示了在一维分析中被掩盖的、在敏感身份群体交叉处存在的性能差异。