Differential item functioning (DIF) is a widely used statistical notion for identifying items that may disadvantage specific groups of test-takers. These groups are often defined by non-manipulable characteristics, e.g., gender, race/ethnicity, or English-language learner (ELL) status. While DIF can be framed as a causal fairness problem by treating group membership as the treatment variable, this invokes the long-standing controversy over the interpretation of causal effects for non-manipulable treatments. To better identify and interpret causal sources of DIF, this study leverages an interventionist approach using treatment decomposition proposed by Robins and Richardson (2010). Under this framework, we can decompose a non-manipulable treatment into intervening variables. For example, ELL status can be decomposed into English vocabulary unfamiliarity and classroom learning barriers, each of which influences the outcome through different causal pathways. We formally define separable DIF effects associated with these decomposed components, depending on the absence or presence of item impact, and provide causal identification strategies for each effect. We then apply the framework to biased test items in the SAT and Regents exams. We also provide formal detection methods using causal machine learning methods, namely causal forests and Bayesian additive regression trees, and demonstrate their performance through a simulation study. Finally, we discuss the implications of adopting interventionist approaches in educational testing practices.
翻译:差异项目功能(DIF)是一种广泛使用的统计概念,用于识别可能对特定考生群体不利的测试项目。这些群体通常由不可操纵的特征定义,例如性别、种族/民族或英语学习者(ELL)身份。虽然通过将群体归属视为处理变量,DIF可被构建为一个因果公平性问题,但这引发了关于非可操纵处理变量因果效应解释的长期争议。为了更好地识别和解释DIF的因果来源,本研究采用Robins和Richardson(2010)提出的基于处理分解的干预主义方法。在此框架下,我们可以将非可操纵处理分解为干预变量。例如,ELL身份可分解为英语词汇不熟悉度和课堂学习障碍,每个因素通过不同的因果路径影响测试结果。我们根据项目影响的存在与否,正式定义了与这些分解成分相关的可分离DIF效应,并为每种效应提供了因果识别策略。随后,我们将该框架应用于SAT和Regents考试中的偏差测试项目。我们还利用因果机器学习方法(即因果森林和贝叶斯加性回归树)提供了正式检测方法,并通过模拟研究验证了其性能。最后,我们讨论了在教育测试实践中采用干预主义方法的意义。