We investigate Bayes posterior distributions in high-dimensional generalized linear models (GLMs) under the proportional asymptotics regime, where the number of features and samples diverge at a comparable rate. Specifically, we characterize the limiting behavior of finite-dimensional marginals of the posterior. We establish that the posterior does not contract in this setting. Yet, the finite-dimensional posterior marginals converge to Gaussian tilts of the prior, where the mean of the Gaussian depends on the true signal coordinates of interest. Notably, the effect of the prior survives even in the limit of large samples and dimensions. We further characterize the behavior of the posterior mean and demonstrate that the posterior mean can strictly outperform the maximum likelihood estimate in mean-squared error in natural examples. Importantly, our results hold regardless of the sparsity level of the underlying signal. On the technical front, we introduce leave-one-out strategies for studying these marginals that may be of independent interest for analyzing low-dimensional functionals of high-dimensional signals in other Bayesian inference problems.
翻译:本文研究比例渐近机制下高维广义线性模型(GLMs)中的贝叶斯后验分布,其中特征数量与样本量以可比较的速率发散。具体而言,我们刻画了后验有限维边缘分布的极限行为。我们证明在此设定下后验分布不发生收缩,但有限维后验边缘分布收敛至先验分布的高斯倾斜形式,其中高斯分布的均值取决于所关注真实信号的坐标。值得注意的是,即使在大样本和高维极限下,先验的影响依然持续存在。我们进一步刻画了后验均值的行为,并证明在典型示例中后验均值的均方误差严格优于极大似然估计。重要的是,我们的结果不受底层信号稀疏度水平的影响。在技术层面,我们提出了研究这些边缘分布的留一法策略,该方法对于分析其他贝叶斯推断问题中高维信号的低维泛函可能具有独立价值。