We study wide Bayesian neural networks focusing on the rare but statistically dominant fluctuations that govern posterior concentration, beyond Gaussian-process limits. Large-deviation theory provides explicit variational objectives-rate functions-on predictors, providing an emerging notion of complexity and feature learning directly at the functional level. We show that the posterior output rate function is obtained by a joint optimization over predictors and internal kernels, in contrast with fixed-kernel (NNGP) theory. Numerical experiments demonstrate that the resulting predictions accurately describe finite-width behavior for moderately sized networks, capturing non-Gaussian tails, posterior deformation, and data-dependent kernel selection effects.
翻译:本研究聚焦于宽贝叶斯神经网络,探讨超越高斯过程极限、主导后验集中性的罕见但统计显著的涨落现象。大偏差理论为预测器提供了显式变分目标——速率函数,从而在函数层面直接形成关于复杂度与特征学习的新兴概念框架。与固定核(神经网络高斯过程)理论不同,本文证明后验输出速率函数需通过预测器与内部核的联合优化获得。数值实验表明,所得预测能精确描述中等规模有限宽度网络的行为特性,包括非高斯尾部、后验形变以及数据依赖的核选择效应。