Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{\top}$ with $A \in \mathbb{R}^{m \times r}$, $B \in \mathbb{R}^{n \times r}$, we induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as $\sqrt{r(m+n)}$ instead of $\sqrt{m n}$, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves predictive performance competitive with 5-member Deep Ensembles while using up to $15\times$ fewer parameters. Furthermore, it substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines.
翻译:贝叶斯神经网络有望提供校准的不确定性,但标准均值场高斯后验需要$O(mn)$个参数量。我们认为这一成本通常是不必要的,尤其当权重矩阵呈现快速奇异值衰减时。通过将权重参数化为$W = AB^{\top}$,其中$A \in \mathbb{R}^{m \times r}$,$B \in \mathbb{R}^{n \times r}$,我们诱导出一个相对于勒贝格测度奇异的、集中于秩$r$流形上的后验分布。这种奇异性通过共享的隐因子捕获结构化的权重相关性,在几何上区别于均值场的独立性假设。我们推导出PAC-Bayes泛化界,其复杂度项以$\sqrt{r(m+n)}$而非$\sqrt{m n}$为尺度,并利用Eckart-Young-Mirsky定理证明了可将误差分解为优化误差与秩诱导偏差的损失界。我们进一步将近期针对低秩确定性网络的高斯复杂度界适配至贝叶斯预测均值。在MLP、LSTM和Transformer模型的标准基准测试中,我们的方法实现了与5成员深度集成方法相当的预测性能,同时参数量最多减少$15\times$。此外,相较于均值场和扰动基线方法,本方法显著提升了分布外检测能力,并经常改进校准效果。