A number of popular transfer learning methods rely on grid search to select regularization hyperparameters that control over-fitting. This grid search requirement has several key disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the size of available data for model training, and requires practitioners to specify candidate values. In this paper, we propose an alternative to grid search: directly learning regularization hyperparameters on the full training set via model selection techniques based on the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we specifically recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior while remaining a valid bound on the evidence for Bayesian model selection. Our proposed technique overcomes all three disadvantages of grid search. We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches with far less compute time.
翻译:许多流行的迁移学习方法依赖网格搜索来选择控制过拟合的正则化超参数。这种网格搜索需求存在几个关键缺陷:搜索计算成本高昂、需要划分验证集从而减少模型训练可用数据量,并且要求实践者预先指定候选值。本文提出一种替代网格搜索的方法:通过基于变分方法中证据下界目标的模型选择技术,直接在完整训练集上学习正则化超参数。针对具有数百万参数的深度神经网络,我们特别提出一种改进的证据下界目标,在保持贝叶斯模型选择证据有效下界的前提下,提升数据似然相对于先验分布的权重。我们提出的技术克服了网格搜索的所有三个缺陷。通过在多个数据集上的图像分类任务进行验证,该方法在显著减少计算时间的同时,取得了与现有方法相当的留出准确率。