Machine learning models are often used to make predictions about admissions process outcomes, such as for colleges or jobs. However, such decision processes differ substantially from the conventional machine learning paradigm. Because admissions decisions are capacity-constrained, whether a student is admitted depends on the other applicants who apply. We show how this dependence affects predictive performance even in otherwise ideal settings. Theoretically, we introduce two concepts that characterize the relationship between admission function properties, machine learning representation, and generalization to applicant pool distribution shifts: instability, which measures how many existing decisions can change when a single new applicant is introduced; and variability, which measures the number of unique students whose decisions can change. Empirically, we illustrate our theory on individual-level admissions data from the New York City high school matching system, showing that machine learning performance degrades as the applicant pool increasingly differs from the training data. Furthermore, there are larger performance drops for schools using decision rules that are more unstable and variable. Our work raises questions about the reliability of predicting individual admissions probabilities.
翻译:机器学习模型常用于预测录取过程的结果,例如大学或工作的录取。然而,这类决策过程与传统机器学习范式存在显著差异。由于录取决策受容量限制,学生是否被录取取决于其他申请者的情况。我们展示了即使在理想环境下,这种依赖性如何影响预测性能。理论上,我们引入两个概念来刻画录取函数特性、机器学习表示与申请者群体分布变化泛化能力之间的关系:不稳定性,衡量引入单个新申请者时现有决策可能改变的数量;以及变异性,衡量决策可能发生改变的独特学生数量。实证上,我们利用纽约市高中匹配系统的个体层面录取数据验证了理论,表明当申请者群体与训练数据差异增大时,机器学习性能会下降。此外,采用更不稳定和更具变异性的决策规则的学校会出现更大的性能下降。我们的研究对预测个体录取概率的可靠性提出了质疑。