Sparse Bayesian Multidimensional Item Response Theory

Multivariate Item Response Theory (MIRT) is sought-after widely by applied researchers looking for interpretable (sparse) explanations underlying response patterns in questionnaire data. There is, however, an unmet demand for such sparsity discovery tools in practice. Our paper develops a Bayesian platform for binary and ordinal item MIRT which requires minimal tuning and scales well on large datasets due to its parallelizable features. Bayesian methodology for MIRT models has traditionally relied on MCMC simulation, which cannot only be slow in practice, but also often renders exact sparsity recovery impossible without additional thresholding. In this work, we develop a scalable Bayesian EM algorithm to estimate sparse factor loadings from mixed continuous, binary, and ordinal item responses. We address the seemingly insurmountable problem of unknown latent factor dimensionality with tools from Bayesian nonparametrics which enable estimating the number of factors. Rotations to sparsity through parameter expansion further enhance convergence and interpretability without identifiability constraints. In our simulation study, we show that our method reliably recovers both the factor dimensionality as well as the latent structure on high-dimensional synthetic data even for small samples. We demonstrate the practical usefulness of our approach on three datasets: an educational assessment dataset, a quality-of-life measurement dataset, and a bio-behavioral dataset. All demonstrations show that our tool yields interpretable estimates, facilitating interesting discoveries that might otherwise go unnoticed under a pure confirmatory factor analysis setting.

翻译：多维项目反应理论（MIRT）因其能为问卷数据中的反应模式提供可解释（稀疏）的潜在解释而备受应用研究者青睐。然而，实践中此类稀疏性发现工具的需求尚未得到充分满足。本文为二元与有序项目MIRT开发了一个贝叶斯平台，该平台需要极少的调参，并因其可并行化特性而能良好地扩展到大型数据集。传统的MIRT模型贝叶斯方法依赖于MCMC模拟，这不仅在实践中可能速度缓慢，而且若不进行额外的阈值处理，通常无法实现精确的稀疏性恢复。在本工作中，我们开发了一种可扩展的贝叶斯EM算法，用于从混合的连续、二元及有序项目反应中估计稀疏因子载荷。我们利用贝叶斯非参数方法中的工具解决了潜因子维度未知这一看似难以克服的问题，从而能够估计因子数量。通过参数扩展实现的稀疏旋转进一步增强了收敛性和可解释性，且无需施加可识别性约束。在我们的模拟研究中，我们表明即使在样本量较小的情况下，我们的方法也能在高维合成数据上可靠地恢复因子维度及潜在结构。我们在三个数据集上展示了本方法的实际效用：一个教育评估数据集、一个生活质量测量数据集以及一个生物行为数据集。所有实例均表明，我们的工具能产生可解释的估计，促进在纯验证性因子分析设定下可能被忽视的有趣发现。