This study investigates differential item functioning (DIF) detection in computerized adaptive testing (CAT) using multilevel modeling. We argue that traditional DIF methods have proven ineffective in CAT due to the hierarchical nature of the data. Our proposed two-level model accounts for dependencies between items via provisional ability estimates. Simulations revealed that our model outperformed others in Type-I error control and power, particularly in scenarios with high exposure rates and longer tests. Expanding item pools, incorporating item parameters, and exploring Bayesian estimation are recommended for future research to further enhance DIF detection in CAT. Balancing model complexity with convergence remains a key challenge for robust outcomes.
翻译:本研究采用多层次建模方法,探讨计算机化自适应测试(CAT)中的题目功能差异(DIF)检测问题。我们认为,由于CAT数据具有层次化结构,传统DIF检测方法在该场景下已被证明效果有限。我们提出的双层模型通过临时能力估计值来刻画题目间的依赖关系。模拟实验表明,该模型在I类错误控制与统计功效方面均优于其他方法,尤其在题目曝光率高、测试长度大的情境中优势更为显著。未来研究可通过拓展题库规模、纳入题目参数估计以及探索贝叶斯估计方法,进一步提升CAT中DIF检测的效能。如何在模型复杂性与收敛性之间取得平衡,仍是实现稳健检测结果的关键挑战。