Autoencoded sparse Bayesian in-IRT factorization, calibration, and amortized inference for the Work Disability Functional Assessment Battery

The Work Disability Functional Assessment Battery (WD-FAB) is a multidimensional item response theory (IRT) instrument designed for assessing work-related mental and physical function based on responses to an item bank. In prior iterations it was developed using traditional means -- linear factorization and null hypothesis statistical testing for item partitioning/selection, and finally, posthoc calibration of disjoint unidimensional IRT models. As a result, the WD-FAB, like many other IRT instruments, is a posthoc model. Its item partitioning, based on exploratory factor analysis, is blind to the final nonlinear IRT model and is not performed in a manner consistent with goodness of fit to the final model. In this manuscript, we develop a Bayesian hierarchical model for self-consistently performing the following simultaneous tasks: scale factorization, item selection, parameter identification, and response scoring. This method uses sparsity-based shrinkage to obviate the linear factorization and null hypothesis statistical tests that are usually required for developing multidimensional IRT models, so that item partitioning is consistent with the ultimate nonlinear factor model. We also analogize our multidimensional IRT model to probabilistic autoencoders, specifying an encoder function that amortizes the inference of ability parameters from item responses. The encoder function is equivalent to the "VBE" step in a stochastic variational Bayesian expectation maximization (VBEM) procedure that we use for approxiamte Bayesian inference on the entire model. We use the method on a sample of WD-FAB item responses and compare the resulting item discriminations to those obtained using the traditional posthoc method.

翻译：工作障碍功能评估量表（WD-FAB）是一种基于多维项目反应理论（IRT）的测量工具，旨在通过项目库中的响应评估与工作相关的心理和身体功能。在先前的迭代中，其开发采用传统方法——通过线性因子分解和零假设统计检验进行项目划分/选择，最后对不连续的单维IRT模型进行事后标定。因此，WD-FAB与许多其他IRT工具一样属于事后模型。其基于探索性因子分析的项目划分对最终的非线性IRT模型不敏感，且并非以与最终模型拟合优度一致的方式进行。本文开发了一种贝叶斯层次模型，用于自洽地同时执行以下任务：量表因子分解、项目选择、参数辨识和响应评分。该方法利用基于稀疏性的收缩来避免开发多维IRT模型通常所需的线性因子分解和零假设统计检验，从而使项目划分与最终的非线性因子模型一致。我们还将多维IRT模型类比为概率自编码器，定义了一个编码函数，该函数实现了从项目响应中摊销推断能力参数。编码函数等价于我们用于整个模型近似贝叶斯推断的随机变分贝叶斯期望最大化（VBEM）过程中的"VBE"步骤。我们将该方法应用于WD-FAB项目响应样本，并将所得项目区分度与使用传统事后方法获得的结果进行比较。