Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{\top}$ with $A \in \mathbb{R}^{m \times r}$, $B \in \mathbb{R}^{n \times r}$, we induce a posterior that is \emph{singular} with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as $\sqrt{r(m+n)}$ instead of $\sqrt{m n}$, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves competitive predictive performance while using up to $33\times$ fewer parameters than 5-member Deep Ensembles. It substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines, while Deep Ensembles can still be stronger on in-distribution likelihood-based metrics.
翻译:贝叶斯神经网络承诺提供校准的不确定性,但标准均值场高斯后验需要$O(mn)$个参数。我们论证这一成本通常是不必要的,尤其是当权重矩阵表现出快速奇异值衰减时。通过将权重参数化为$W = AB^{\top}$,其中$A \in \mathbb{R}^{m \times r}$,$B \in \mathbb{R}^{n \times r}$,我们诱导了一个相对于勒贝格测度是“奇异”的后验,该后验集中在秩-$r$流形上。这种奇异性通过共享潜在因子捕捉结构化的权重相关性,在几何上不同于均值场的独立性假设。我们推导了PAC-贝叶斯泛化界,其复杂度项按$\sqrt{r(m+n)}$缩放,而非$\sqrt{m n}$,并利用Eckart-Young-Mirsky定理证明了损失界,将误差分解为优化误差和秩诱导偏差。我们进一步将低秩确定性网络的高斯复杂度界适配到贝叶斯预测均值上。在经验上,跨MLP、LSTM和Transformer在标准基准上的实验表明,我们的方法在使用比5成员深度集成少高达$33$倍参数的情况下,实现了具有竞争力的预测性能。它显著改进了分布外检测,并相对于均值场和扰动基线通常改善了校准,而在分布内基于似然的指标上,深度集成可能仍表现更强。