Modeling binary and categorical data is one of the most commonly encountered tasks of applied statisticians and econometricians. While Bayesian methods in this context have been available for decades now, they often require a high level of familiarity with Bayesian statistics or suffer from issues such as low sampling efficiency. To contribute to the accessibility of Bayesian models for binary and categorical data, we introduce novel latent variable representations based on P\'olya-Gamma random variables for a range of commonly encountered logistic regression models. From these latent variable representations, new Gibbs sampling algorithms for binary, binomial, and multinomial logit models are derived. All models allow for a conditionally Gaussian likelihood representation, rendering extensions to more complex modeling frameworks such as state space models straightforward. However, sampling efficiency may still be an issue in these data augmentation based estimation frameworks. To counteract this, novel marginal data augmentation strategies are developed and discussed in detail. The merits of our approach are illustrated through extensive simulations and real data applications.
翻译:二分类与多分类数据建模是应用统计学家与计量经济学家最常面临的任务之一。尽管贝叶斯方法在此领域已发展数十年,但通常要求使用者对贝叶斯统计具有较高熟悉度,或面临采样效率低下等问题。为提升贝叶斯模型在二分类与多分类数据中的可及性,我们针对一系列常见逻辑回归模型,提出了基于波利亚-伽马随机变量的新型潜变量表示方法。基于这些潜变量表示,我们推导出适用于二分类、二项分布及多项分布logit模型的新型吉布斯采样算法。所有模型均允许条件高斯似然表示,从而可便捷地扩展至状态空间模型等更复杂的建模框架。然而,在这些数据增广估计框架中,采样效率仍可能成为难题。为解决此问题,我们开发并详细讨论了新型边缘数据增广策略。通过大量模拟实验和实际数据应用,验证了我们方法的优越性。