Markov Chain Monte Carlo (MCMC), Laplace approximation (LA) and variational inference (VI) methods are popular approaches to Bayesian inference, each with trade-offs between computational cost and accuracy. However, a theoretical understanding of these differences is missing, particularly when both the sample size $n$ and the dimension $d$ are large. LA and Gaussian VI are justified by Bernstein-von Mises (BvM) theorems, and recent work has derived the characteristic condition $n\gg d^2$ for their validity, improving over the condition $n\gg d^3$. In this paper, we show for linear, logistic and Poisson regression that for $n\gtrsim d$, MCMC attains the same complexity scaling in $n$, $d$ as first-order optimization algorithms, up to sub-polynomial factors. Thus MCMC is competitive with LA and Gaussian VI in complexity, under a scaling between $n$ and $d$ more general than BvM regimes. Our complexities apply to appropriately scaled priors that are not necessarily Gaussian-tailed, including Student-$t$ and flat priors, with log-posteriors that are not necessarily globally concave or gradient-Lipschitz.
翻译:马尔可夫链蒙特卡洛(MCMC)、拉普拉斯近似(LA)和变分推断(VI)方法是贝叶斯推断中常用的方法,每种方法在计算成本与精度之间存在权衡。然而,目前对这些差异的理论理解尚不充分,尤其是在样本量 $n$ 和维度 $d$ 均较大的情况下。LA 和高斯 VI 的理论依据是伯恩斯坦-冯·米塞斯(BvM)定理,近期研究已推导出其有效性的特征条件 $n\gg d^2$,相较于条件 $n\gg d^3$ 有所改进。本文在线性回归、逻辑回归和泊松回归中证明,当 $n\gtrsim d$ 时,MCMC 在 $n$ 和 $d$ 上的复杂度缩放与一阶优化算法相同,仅相差次多项式因子。因此,在 $n$ 与 $d$ 之间比 BvM 机制更一般的缩放关系下,MCMC 在复杂度上与 LA 和高斯 VI 具有竞争力。我们的复杂度分析适用于适当缩放的先验分布,这些先验不一定是高斯尾分布,包括 Student-$t$ 先验和平坦先验,且对应的对数后验不一定是全局凹函数或梯度 Lipschitz 连续的。