Bayesian neural networks (BNN) promise to combine the predictive performance of neural networks with principled uncertainty modeling important for safety-critical systems and decision making. However, posterior uncertainty estimates depend on the choice of prior, and finding informative priors in weight-space has proven difficult. This has motivated variational inference (VI) methods that pose priors directly on the function generated by the BNN rather than on weights. In this paper, we address a fundamental issue with such function-space VI approaches pointed out by Burt et al. (2020), who showed that the objective function (ELBO) is negative infinite for most priors of interest. Our solution builds on generalized VI (Knoblauch et al., 2019) with the regularized KL divergence (Quang, 2019) and is, to the best of our knowledge, the first well-defined variational objective for function-space inference in BNNs with Gaussian process (GP) priors. Experiments show that our method incorporates the properties specified by the GP prior on synthetic and small real-world data sets, and provides competitive uncertainty estimates for regression, classification and out-of-distribution detection compared to BNN baselines with both function and weight-space priors.
翻译:贝叶斯神经网络(BNN)有望将神经网络的预测性能与对安全关键系统和决策至关重要的原则性不确定性建模相结合。然而,后验不确定性估计依赖于先验分布的选择,而权重空间中信息性先验的构建已被证明是困难的。这促使了变分推断(VI)方法的发展,这些方法直接在BNN生成的函数上设置先验,而非在权重上。本文针对Burt等人(2020)指出的此类函数空间VI方法的一个基本问题展开研究,他们证明了对于大多数有实际意义的先验分布,其目标函数(ELBO)均为负无穷。我们的解决方案建立在广义VI(Knoblauch等人,2019)与正则化KL散度(Quang,2019)的基础上,据我们所知,这是首个为具有高斯过程(GP)先验的BNNs函数空间推断所构建的良定义变分目标。实验表明,在合成数据集和小规模真实数据集上,我们的方法能够有效融入GP先验所指定的特性,并且在回归、分类和分布外检测任务中,与同时采用函数空间和权重空间先验的BNN基线模型相比,提供了具有竞争力的不确定性估计。