Monte Carlo simulations are the primary methodology for evaluating Item Response Theory (IRT) methods, yet marginal reliability - the fundamental metric of data informativeness - is rarely treated as an explicit design factor. Unlike in multilevel modeling where the intraclass correlation (ICC) is routinely manipulated, IRT studies typically treat reliability as an incidental outcome, creating a "reliability omission" that obscures the signal-to-noise ratio of generated data. To address this gap, we introduce a principled framework for reliability-targeted simulation, transforming reliability from an implicit by-product into a precise input parameter. We formalize the inverse design problem, solving for a global discrimination scaling factor that uniquely achieves a pre-specified target reliability. Two complementary algorithms are proposed: Empirical Quadrature Calibration (EQC) for rapid, deterministic precision, and Stochastic Approximation Calibration (SAC) for rigorous stochastic estimation. A comprehensive validation study across 960 conditions demonstrates that EQC achieves essentially exact calibration, while SAC remains unbiased across non-normal latent distributions and empirical item pools. Furthermore, we clarify the theoretical distinction between average-information and error-variance-based reliability metrics, showing they require different calibration scales due to Jensen's inequality. An accompanying open-source R package, IRTsimrel, enables researchers to standardize reliability as a controlled experimental input.
翻译:蒙特卡洛模拟是评估项目反应理论(IRT)方法的主要手段,然而边际可靠性——衡量数据信息量的基本指标——却很少被作为显式设计因素加以考量。与多层建模中常规操纵组内相关系数(ICC)的做法不同,IRT研究通常将可靠性视为附带结果,这种“可靠性忽略”现象掩盖了生成数据的信噪比。为填补这一空白,我们提出了一个面向可靠性的模拟原则框架,将可靠性从隐性副产品转化为精确的输入参数。我们形式化了逆向设计问题,通过求解全局区分度缩放因子来唯一实现预设的目标可靠性。提出了两种互补算法:用于快速确定性校准的经验求积校准法(EQC),以及用于严格随机估计的随机逼近校准法(SAC)。涵盖960种条件的综合验证研究表明,EQC能实现近乎精确的校准,而SAC在非正态潜在分布和实证项目池中始终保持无偏性。此外,我们厘清了基于平均信息与基于误差方差的可靠性指标在理论上的区别,证明由于詹森不等式的影响,二者需要不同的校准尺度。配套的开源R软件包IRTsimrel使研究人员能够将可靠性标准化为受控的实验输入参数。