In Bayesian inference, a widespread technique to compute integrals against a high-dimensional posterior is to use a Gaussian proxy to the posterior known as the Laplace approximation. We address the question of accuracy of the approximation in terms of TV distance, in the regime in which dimension $d$ grows with sample size $n$. Multiple prior works have shown the requirement $d^3\ll n$ is sufficient for accuracy of the approximation. But in a recent breakthrough, Kasprzak et al, 2022 derived an upper bound scaling as $d/\sqrt n$. In this work, we further refine our understanding of the Laplace approximation error by decomposing the TV error into an $O(d/\sqrt n)$ leading order term, and an $O(d^2/n)$ remainder. This decomposition has far reaching implications: first, we use it to prove that the requirement $d^2\ll n$ cannot in general be improved by showing TV$\gtrsim d/\sqrt n$ for a posterior stemming from logistic regression with Gaussian design. Second, the decomposition provides tighter and more easily computable upper bounds on the TV error. Our result also opens the door to proving the BvM in the $d^2\ll n$ regime, and correcting the Laplace approximation to account for skew; this is pursued in two follow-up works.
翻译:在贝叶斯推断中,一种广泛用于计算高维后验分布积分的常用技术是采用后验的高斯代理,即拉普拉斯近似。我们研究了在维度d随样本量n增长的框架下,该近似在总变差距离意义下的精度问题。此前多项研究表明,条件d³≪n足以保证近似的精度。但在近期突破性工作中,Kasprzak等人(2022)推导出上界为d/√n。本文通过将总变差误差分解为O(d/√n)的主导项和O(d²/n)的余项,进一步深化了对拉普拉斯近似误差的理解。这一分解具有深远意义:首先,我们利用该分解证明,对具有高斯设计的逻辑回归后验,总变差≳d/√n,因此条件d²≪n通常无法改进;其次,该分解提供了更紧且易于计算的总变差误差上界。我们的结果还为在d²≪n框架下证明伯恩斯坦-冯·米塞斯定理以及通过修正拉普拉斯近似以考虑偏斜性开辟了道路,这些内容将在后续两项工作中进一步探讨。