Scalable Bayesian Inference in the Era of Deep Learning: From Gaussian Processes to Deep Neural Networks

Large neural networks trained on large datasets have become the dominant paradigm in machine learning. These systems rely on maximum likelihood point estimates of their parameters, precluding them from expressing model uncertainty. This may result in overconfident predictions and it prevents the use of deep learning models for sequential decision making. This thesis develops scalable methods to equip neural networks with model uncertainty. In particular, we leverage the linearised Laplace approximation to equip pre-trained neural networks with the uncertainty estimates provided by their tangent linear models. This turns the problem of Bayesian inference in neural networks into one of Bayesian inference in conjugate Gaussian-linear models. Alas, the cost of this remains cubic in either the number of network parameters or in the number of observations times output dimensions. By assumption, neither are tractable. We address this intractability by using stochastic gradient descent (SGD) -- the workhorse algorithm of deep learning -- to perform posterior sampling in linear models and their convex duals: Gaussian processes. With this, we turn back to linearised neural networks, finding the linearised Laplace approximation to present a number of incompatibilities with modern deep learning practices -- namely, stochastic optimisation, early stopping and normalisation layers -- when used for hyperparameter learning. We resolve these and construct a sample-based EM algorithm for scalable hyperparameter learning with linearised neural networks. We apply the above methods to perform linearised neural network inference with ResNet-50 (25M parameters) trained on Imagenet (1.2M observations and 1000 output dimensions). Additionally, we apply our methods to estimate uncertainty for 3d tomographic reconstructions obtained with the deep image prior network.

翻译：在大规模数据集上训练的大型神经网络已成为机器学习的主导范式。这些系统依赖其参数的最大似然点估计，因而无法表达模型不确定性。这可能导致过度自信的预测，并阻碍深度学习模型在序贯决策中的应用。本论文开发了为神经网络配备模型不确定性的可扩展方法。具体而言，我们利用线性化拉普拉斯近似，为预训练的神经网络提供由其切线线性模型产生的不确定性估计。这将神经网络中的贝叶斯推断问题转化为共轭高斯-线性模型中的贝叶斯推断问题。然而，该方法的计算成本仍与网络参数数量或观测次数乘以输出维度呈三次方关系。根据假设，这两种情况均不可处理。我们通过使用随机梯度下降（SGD）——深度学习的主力优化算法——在线性模型及其凸对偶形式（高斯过程）中进行后验采样来解决这一不可处理性问题。基于此，我们回到线性化神经网络，发现线性化拉普拉斯近似在与现代深度学习实践（即随机优化、早停和归一化层）结合用于超参数学习时，存在诸多不兼容性。我们解决了这些问题，并构建了基于样本的期望最大化算法，用于线性化神经网络的可扩展超参数学习。我们将上述方法应用于在ImageNet（120万观测样本、1000个输出维度）上训练的ResNet-50（2500万参数）的线性化神经网络推断。此外，我们还将方法应用于估计通过深度图像先验网络获得的三维断层重建的不确定性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日