Latent variable models are popularly used to measure latent factors (e.g., abilities and personalities) from large-scale assessment data. Beyond understanding these latent factors, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating the fairness of educational testing, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender and race), taking into account their latent abilities. However, the large sample sizes and test lengths pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the commonly encountered discrete responses, nonlinear latent factor models are often assumed, adding further complexity. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects. Furthermore, we derive estimation and inference results for latent factors and the factor loadings. We illustrate the finite sample performance of the proposed method through extensive numerical studies and an educational assessment dataset from the Programme for International Student Assessment (PISA).
翻译:潜变量模型被广泛应用于从大规模评估数据中测量潜在因子(如能力与人格特质)。除了理解这些潜在因子外,控制潜在因子后协变量对响应的效应也具有重要的科学意义及广泛应用,例如在教育测试公平性评估中,协变量效应反映了在考虑被试潜在能力的前提下,测试题目是否对某些个体特征(如性别与种族)存在偏差。然而,大样本量与测试长度对开发高效方法与进行有效统计推断提出了挑战。此外,为适应常见的离散响应数据,通常需假设非线性潜变量模型,这进一步增加了问题的复杂性。为应对这些挑战,我们考虑一种协变量调整的广义因子模型,并建立了新颖且可解释的条件以解决模型可识别性问题。基于可识别性条件,我们提出了一种联合最大似然估计方法,并为协变量效应建立了估计相合性与渐近正态性结果。进一步地,我们推导了潜在因子与因子载荷的估计及推断结果。通过大量数值模拟研究以及来自国际学生评估项目(PISA)的教育评估数据集,我们展示了所提方法在有限样本下的表现。