Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. Bayesian neural networks (BNNs) provide a framework for this purpose by constructing a Bayesian posterior distribution over the network parameters. However, the prior, which is of key importance in any Bayesian setting, is rarely meaningful for BNNs. This is because the complexity of the input-to-output map of a BNN makes it difficult to understand how certain distributions enforce any interpretable constraint on the output space. Gaussian processes (GPs), on the other hand, are often preferred in uncertainty quantification tasks due to their interpretability. The drawback is that GPs are limited to small datasets without advanced techniques, which often rely on the covariance kernel having a specific structure. To address these challenges, we introduce a new class of priors for BNNs, called Mercer priors, such that the resulting BNN has samples which approximate that of a specified GP. The method works by defining a prior directly over the network parameters from the Mercer representation of the covariance kernel, and does not rely on the network having a specific structure. In doing so, we can exploit the scalability of BNNs in a meaningful Bayesian way.
翻译:在科学或工程应用中,当必须在有限或噪声数据下做出决策时,量化神经网络输出的不确定性至关重要。贝叶斯神经网络通过构建网络参数的贝叶斯后验分布为此提供了框架。然而,在任何贝叶斯设置中都至关重要的先验分布,对于贝叶斯神经网络而言却往往缺乏实际意义。这是因为贝叶斯神经网络从输入到输出的映射复杂性使得我们难以理解特定分布如何在输出空间上施加可解释的约束。相比之下,高斯过程因其可解释性常在不确定性量化任务中更受青睐。但其局限在于,若不借助依赖协方差核特定结构的高级技术,高斯过程仅适用于小规模数据集。为应对这些挑战,我们引入了一类新的贝叶斯神经网络先验——Mercer先验,使得所得贝叶斯神经网络的采样能够逼近指定高斯过程的采样。该方法通过直接依据协方差核的Mercer表示定义网络参数上的先验实现,且不要求网络具备特定结构。借此,我们能够以具有实际意义的贝叶斯方式充分利用贝叶斯神经网络的可扩展性。