This article focuses on inference in logistic regression for high-dimensional binary outcomes. A popular approach induces dependence across the outcomes by including latent factors in the linear predictor. Bayesian approaches are useful for characterizing uncertainty in inferring the regression coefficients, factors and loadings, while also incorporating hierarchical and shrinkage structure. However, Markov chain Monte Carlo algorithms for posterior computation face challenges in scaling to high-dimensional outcomes. Motivated by applications in ecology, we exploit a blessing of dimensionality to motivate pre-estimation of the latent factors. Conditionally on the factors, the outcomes are modeled via independent logistic regressions. We implement Gaussian approximations in parallel in inferring the posterior on the regression coefficients and loadings, including a simple adjustment to obtain credible intervals with valid frequentist coverage. We show posterior concentration properties and excellent empirical performance in simulations. The methods are applied to insect biodiversity data in Madagascar.
翻译:本文聚焦于高维二元结果逻辑回归的推断问题。一种常用方法是在线性预测器中引入潜因子以诱导结果间的依赖性。贝氏方法在推断回归系数、因子及载荷时能有效刻画不确定性,同时可纳入层次结构与收缩结构。然而,用于后验计算的马尔可夫链蒙特卡罗算法在处理高维结果时面临可扩展性挑战。受生态学应用启发,我们利用维度优势提出潜因子的预估计方法。在给定因子的条件下,结果通过独立逻辑回归进行建模。我们采用并行高斯逼近法推断回归系数与载荷的后验分布,并通过简单调整获得具有有效频率学派覆盖率的可信区间。理论分析展示了后验集中性质,仿真实验验证了优异的实证性能。该方法已应用于马达加斯加昆虫生物多样性数据的分析。