Bayesian multinomial logistic regression provides a principled, interpretable approach to multiclass classification, but posterior sampling becomes increasingly expensive as the model dimension grows. Prior work has studied scalability in the number of subjects and covariates; in contrast, this paper focuses on how computation changes as the number of outcome categories increases. To improve scalability in settings with numerous categories, we adapt a gamma-augmentation strategy to decouple category-specific coefficient updates, so that each category's coefficients can be updated conditional on a single auxiliary variable per subject, rather than on the full set of other categories' coefficients. Because the resulting coefficient conditionals are non-conjugate, we couple this augmentation with either adaptive Metropolis-Hastings or elliptical slice sampling. Through simulation and a real-data example, we compare effective sample size and effective sampling rate across several standard competitors. We find that the best-performing sampler depends on the dimension and imbalance regime, and that the proposed augmentation provides substantial speedups in scenarios with numerous categories.
翻译:贝叶斯多项逻辑回归为多类别分类提供了一种原理清晰、可解释性强的方法,但随着模型维度的增加,后验采样变得日益昂贵。先前的研究主要关注样本数量和协变量数量方面的可扩展性;相比之下,本文重点探讨了结果类别数量增加时计算方式的变化。为提升在众多类别场景下的可扩展性,我们采用了一种伽马增强策略来解耦特定类别的系数更新,使得每个类别的系数可以在每个样本仅依赖于单个辅助变量的条件下进行更新,而无需依赖于其他所有类别的完整系数集合。由于由此得到的系数条件分布是非共轭的,我们将此增强策略与自适应Metropolis-Hastings方法或椭圆切片采样方法相结合。通过模拟实验和真实数据案例,我们比较了多种标准对比方法在有效样本量和有效采样率方面的表现。研究发现,性能最佳的采样器取决于数据维度与不平衡程度,且所提出的增强策略在众多类别场景下能显著提升计算速度。