Weight space symmetries in neural network architectures, such as permutation symmetries in MLPs, give rise to Bayesian neural network (BNN) posteriors with many equivalent modes. This multimodality poses a challenge for variational inference (VI) techniques, which typically rely on approximating the posterior with a unimodal distribution. In this work, we investigate the impact of weight space permutation symmetries on VI. We demonstrate, both theoretically and empirically, that these symmetries lead to biases in the approximate posterior, which degrade predictive performance and posterior fit if not explicitly accounted for. To mitigate this behavior, we leverage the symmetric structure of the posterior and devise a symmetrization mechanism for constructing permutation invariant variational posteriors. We show that the symmetrized distribution has a strictly better fit to the true posterior, and that it can be trained using the original ELBO objective with a modified KL regularization term. We demonstrate experimentally that our approach mitigates the aforementioned biases and results in improved predictions and a higher ELBO.
翻译:神经网络架构中的权重空间对称性(例如多层感知机中的置换对称性)会导致贝叶斯神经网络后验分布存在大量等价模态。这种多模态特性对通常依赖单模态分布近似后验的变分推断技术构成了挑战。本研究系统探讨了权重空间置换对称性对变分推断的影响。我们通过理论分析和实证研究表明,若未显式处理这些对称性,将导致近似后验产生偏差,从而降低预测性能与后验拟合度。为改善此问题,我们利用后验分布的对称结构,设计了一种构建置换不变变分后验的对称化机制。我们证明对称化分布对真实后验具有严格更优的拟合度,且可通过改进KL正则化项的原ELBO目标函数进行训练。实验结果表明,该方法能有效缓解前述偏差,提升预测性能并获得更高的ELBO值。