Ordinal categorical data are routinely encountered in many practical applications. When the primary goal is to construct a regression model for ordinal outcomes, cumulative link models represent one of the most popular choices to link the cumulative probabilities of the response with a set of covariates through a parsimonious linear predictor, shared across response categories. As the number of observations grows, standard sampling algorithms for Bayesian inference scale poorly, making posterior computation increasingly challenging for large datasets. In this article, we propose three scalable algorithms for approximating the posterior distribution of the regression coefficients in cumulative probit models relying on Variational Bayes and Expectation Propagation. We compare the proposed approaches with inference based on Markov Chain Monte Carlo, demonstrating superior computational performance and remarkable accuracy. Finally, we illustrate the utility of the proposed algorithms on a challenging case study to investigate the structure of a criminal network.
翻译:序数分类数据在众多实际应用中普遍存在。当主要目标是为序数结果构建回归模型时,累积链接模型是最常用的选择之一,它通过一个简约的线性预测器(该预测器在所有响应类别间共享)将响应的累积概率与一组协变量联系起来。随着观测数量的增加,用于贝叶斯推断的标准采样算法扩展性较差,使得针对大型数据集的后验计算变得越来越具有挑战性。在本文中,我们提出了三种可扩展的算法,用于近似累积概率模型中回归系数的后验分布,这些算法依赖于变分贝叶斯和期望传播。我们将所提出的方法与基于马尔可夫链蒙特卡洛的推断进行了比较,证明了其优越的计算性能和显著的准确性。最后,我们通过一个具有挑战性的案例研究来阐述所提出算法的实用性,该研究旨在调查犯罪网络的结构。