Bayesian deep learning all too often underfits so that the Bayesian prediction is less accurate than a simple point estimate. Uncertainty quantification then comes at the cost of accuracy. For linearized models, the null space of the generalized Gauss-Newton matrix corresponds to parameters that preserve the training predictions of the point estimate. We propose to build Bayesian approximations in this null space, thereby guaranteeing that the Bayesian predictive does not underfit. We suggest a matrix-free algorithm for projecting onto this null space, which scales linearly with the number of parameters and quadratically with the number of output dimensions. We further propose an approximation that only scales linearly with parameters to make the method applicable to generative models. An extensive empirical evaluation shows that the approach scales to large models, including vision transformers with 28 million parameters.
翻译:贝叶斯深度学习常常面临欠拟合问题,导致其预测精度甚至低于简单的点估计。这种不确定性量化往往以牺牲准确性为代价。对于线性化模型,广义高斯-牛顿矩阵的零空间对应着能够保持点估计训练预测效果的参数。我们提出在该零空间中构建贝叶斯近似,从而确保贝叶斯预测不会出现欠拟合。我们设计了一种矩阵无关算法用于投影至该零空间,其计算复杂度随参数数量呈线性增长,随输出维度呈二次增长。为进一步将该方法拓展至生成模型,我们提出一种仅随参数数量线性增长的近似方案。大量实证评估表明,该方法可扩展至大型模型,包括参数量达2800万的视觉Transformer。