Pre-trained deep neural networks can be adapted to perform uncertainty estimation by transforming them into Bayesian neural networks via methods such as Laplace approximation (LA) or its linearized form (LLA), among others. To make these methods more tractable, the generalized Gauss-Newton (GGN) approximation is often used. However, due to complex inefficiency difficulties, both LA and LLA rely on further approximations, such as Kronecker-factored or diagonal approximate GGN matrices, which can affect the results. To address these issues, we propose a new method for scaling LLA using a variational sparse Gaussian Process (GP) approximation based on the dual RKHS of GPs. Our method retains the predictive mean of the original model while allowing for efficient stochastic optimization and scalability in both the number of parameters and the size of the training dataset. Moreover, its training cost is independent of the number of training points, improving over previously existing methods. Our preliminary experiments indicate that it outperforms already existing efficient variants of LLA, such as accelerated LLA (ELLA), based on the Nystr\"om approximation.
翻译:预训练的深度神经网络可以通过拉普拉斯近似(LA)或其线性化形式(LLA)等方法转化为贝叶斯神经网络,从而实现对不确定性的估计。为使这些方法更易处理,常采用广义高斯-牛顿(GGN)近似。然而,由于复杂的计算效率难题,LA和LLA均依赖于进一步近似,例如Kronecker分解或对角近似GGN矩阵,这会影响最终结果。为解决这些问题,我们提出了一种基于高斯过程对偶再生核希尔伯特空间的变分稀疏高斯过程近似方法,用于扩展LLA。该方法在保留原模型预测均值的同时,实现了高效的随机优化,并在参数数量和训练数据集规模上均具有良好的可扩展性。此外,其训练成本与训练样本数量无关,优于现有方法。初步实验表明,该方法优于基于Nyström近似的加速LLA(ELLA)等现有高效LLA变体。