Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

Infinitely wide or deep neural networks (NNs) with independent and identically distributed (i.i.d.) parameters have been shown to be equivalent to Gaussian processes. Because of the favorable properties of Gaussian processes, this equivalence is commonly employed to analyze neural networks and has led to various breakthroughs over the years. However, neural networks and Gaussian processes are equivalent only in the limit; in the finite case there are currently no methods available to approximate a trained neural network with a Gaussian model with bounds on the approximation error. In this work, we present an algorithmic framework to approximate a neural network of finite width and depth, and with not necessarily i.i.d. parameters, with a mixture of Gaussian processes with error bounds on the approximation error. In particular, we consider the Wasserstein distance to quantify the closeness between probabilistic models and, by relying on tools from optimal transport and Gaussian processes, we iteratively approximate the output distribution of each layer of the neural network as a mixture of Gaussian processes. Crucially, for any NN and $\epsilon >0$ our approach is able to return a mixture of Gaussian processes that is $\epsilon$-close to the NN at a finite set of input points. Furthermore, we rely on the differentiability of the resulting error bound to show how our approach can be employed to tune the parameters of a NN to mimic the functional behavior of a given Gaussian process, e.g., for prior selection in the context of Bayesian inference. We empirically investigate the effectiveness of our results on both regression and classification problems with various neural network architectures. Our experiments highlight how our results can represent an important step towards understanding neural network predictions and formally quantifying their uncertainty.

翻译：无限宽或无限深的神经网络（NNs）在参数独立同分布（i.i.d.）条件下已被证明等价于高斯过程。由于高斯过程具有优良性质，这种等价性常被用于分析神经网络，并在过去数年间促成了多项突破性进展。然而，神经网络与高斯过程的等价性仅在极限情况下成立；对于有限网络，目前尚无方法能以近似误差界将训练后的神经网络近似为高斯模型。本工作提出一种算法框架，用于以具有近似误差界的混合高斯过程来近似宽度与深度有限、且参数不一定满足独立同分布条件的神经网络。我们特别采用Wasserstein距离来量化概率模型间的接近程度，并借助最优传输与高斯过程的工具，逐层迭代地将神经网络各层的输出分布近似为混合高斯过程。关键的是，对于任意神经网络及任意$\epsilon >0$，我们的方法能够在有限输入点集上返回与神经网络$\epsilon$-接近的混合高斯过程。此外，我们利用所得误差界的可微性，展示了如何运用该方法调整神经网络参数以模拟给定高斯过程的函数行为，例如在贝叶斯推断背景下进行先验选择。我们通过回归与分类问题下的多种神经网络架构实证研究了所提结果的有效性。实验结果表明，我们的成果可为理解神经网络预测及形式化量化其不确定性迈出重要一步。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日