Selecting hyperparameters in deep learning greatly impacts its effectiveness but requires manual effort and expertise. Recent works show that Bayesian model selection with Laplace approximations can allow to optimize such hyperparameters just like standard neural network parameters using gradients and on the training data. However, estimating a single hyperparameter gradient requires a pass through the entire dataset, limiting the scalability of such algorithms. In this work, we overcome this issue by introducing lower bounds to the linearized Laplace approximation of the marginal likelihood. In contrast to previous estimators, these bounds are amenable to stochastic-gradient-based optimization and allow to trade off estimation accuracy against computational complexity. We derive them using the function-space form of the linearized Laplace, which can be estimated using the neural tangent kernel. Experimentally, we show that the estimators can significantly accelerate gradient-based hyperparameter optimization.
翻译:深度学习超参数的选择对其有效性影响重大,但需要人工操作与专业经验。近期研究表明,采用拉普拉斯近似的贝叶斯模型选择能够像标准神经网络参数一样,通过梯度与训练数据对这些超参数进行优化。然而,估计单个超参数梯度需要遍历整个数据集,限制了此类算法的可扩展性。本研究通过引入边际似然线性化拉普拉斯近似的下界,克服了该问题。与先前估计量不同,这些下界适用于基于随机梯度的优化,并允许在估计精度与计算复杂度之间进行权衡。我们利用线性化拉普拉斯的函数空间形式(可通过神经正切核进行估计)推导出这些下界。实验表明,这些估计量能显著加速基于梯度的超参数优化。