This paper proposes a method for assessing differential item functioning (DIF) in item response theory (IRT) models. The method does not require pre-specification of anchor items, which is its main virtue. It is developed in two main steps, first by showing how DIF can be re-formulated as a problem of outlier detection in IRT-based scaling, then tackling the latter using established methods from robust statistics. The proposal is a redescending M-estimator of IRT scaling parameters that is tuned to flag items with DIF at the desired asymptotic Type I Error rate. One way of quantifying the robustness of the estimator is in terms of its finite sample breakdown point, which is shown to equal to 1/2 (i.e., the estimator remains bounded whenever fewer than 1/2 of the items on an assessment exhibit DIF). This theoretical result is complemented by simulation studies that illustrate the performance of the estimator and its associated test of DIF. The simulation studies show that the proposed method compares favorably to currently available approaches, and a real data example illustrates its application in a research context where pre-specification of anchor items is infeasible. The focus of the paper is the two-parameter logistic model in two independent groups, with extensions to other settings considered in the conclusion.
翻译:本文提出了一种在项目反应理论(IRT)模型中评估项目功能差异(DIF)的方法。该方法的主要优点在于无需预先指定锚定项目。其发展分为两个主要步骤:首先展示DIF如何被重新表述为基于IRT量尺化的异常值检测问题,随后利用稳健统计中的成熟方法解决这一问题。所提出的方法是IRT量尺化参数的重降M估计量,通过调整使其能以期望的渐近第一类错误率标记存在DIF的项目。量化该估计量稳健性的一种方式是其有限样本崩溃点,研究表明该值等于1/2(即,当测试中少于一半的项目存在DIF时,该估计量保持有界)。这一理论结果通过模拟研究加以补充,以展示该估计量及其相关DIF检验的性能。模拟研究表明,所提方法优于当前可用的方法,而真实数据示例则说明了其在无法预先指定锚定项目的研究情境中的应用。本文重点探讨两个独立组下的双参数逻辑模型,并在结论部分考虑其他情境的扩展。