Sparse Bayesian Multidimensional Item Response Theory

Multivariate Item Response Theory (MIRT) is sought-after widely by applied researchers looking for interpretable (sparse) explanations underlying response patterns in questionnaire data. There is, however, an unmet demand for such sparsity discovery tools in practice. Our paper develops a Bayesian platform for binary and ordinal item MIRT which requires minimal tuning and scales well on relatively large datasets due to its parallelizable features. Bayesian methodology for MIRT models has traditionally relied on MCMC simulation, which cannot only be slow in practice, but also often renders exact sparsity recovery impossible without additional thresholding. In this work, we develop a scalable Bayesian EM algorithm to estimate sparse factor loadings from binary and ordinal item responses. We address the seemingly insurmountable problem of unknown latent factor dimensionality with tools from Bayesian nonparametrics which enable estimating the number of factors. Rotations to sparsity through parameter expansion further enhance convergence and interpretability without identifiability constraints. In our simulation study, we show that our method reliably recovers both the factor dimensionality as well as the latent structure on high-dimensional synthetic data even for small samples. We demonstrate the practical usefulness of our approach on two datasets: an educational item response dataset and a quality-of-life measurement dataset. Both demonstrations show that our tool yields interpretable estimates, facilitating interesting discoveries that might otherwise go unnoticed under a pure confirmatory factor analysis setting. We provide an easy-to-use software which is a useful new addition to the MIRT toolkit and which will hopefully serve as the go-to method for practitioners.

翻译：多维项目反应理论（MIRT）广泛应用于实证研究者对问卷数据中反应模式的可解释（稀疏）解释的需求。然而，实践中这种稀疏性发现工具的需求尚未得到满足。本文构建了一个适用于二分类和有序类别项目MIRT的贝叶斯平台，该平台因具有可并行化特性而无需过多调参，且能有效扩展至较大规模数据集。传统MIRT模型的贝叶斯方法依赖马尔可夫链蒙特卡洛（MCMC）模拟，不仅可能运行缓慢，而且若无额外阈值处理通常无法实现精确的稀疏性恢复。本研究开发了一种可扩展的贝叶斯期望最大化（EM）算法，用于从二分类和有序类别项目反应中估计稀疏因子载荷。我们利用贝叶斯非参数工具解决了看似难以克服的潜在因子维度未知问题，实现了因子数量的估计。通过参数扩展实现稀疏性旋转，在无识别约束条件下进一步提升了收敛性与可解释性。仿真研究表明，我们的方法即使在样本量较小的情况下，也能从高维合成数据中可靠地恢复因子维度与潜在结构。我们通过两个数据集验证了该方法的实用价值：一个教育领域项目反应数据集，以及一个生活质量测量数据集。两项实证均表明，该工具能提供可解释的估计结果，有助于发现纯验证性因子分析框架中可能被忽略的有趣现象。我们提供了易于使用的软件，这是MIRT工具包的重要补充，有望成为实践者的首选方法。