A number of popular transfer learning methods rely on grid search to select regularization hyperparameters that control over-fitting. This grid search requirement has several key disadvantages: the search is computationally expensive, requires carving out a validation set that reduces the size of available data for model training, and requires practitioners to specify candidate values. In this paper, we propose an alternative to grid search: directly learning regularization hyperparameters on the full training set via model selection techniques based on the evidence lower bound ("ELBo") objective from variational methods. For deep neural networks with millions of parameters, we specifically recommend a modified ELBo that upweights the influence of the data likelihood relative to the prior while remaining a valid bound on the evidence for Bayesian model selection. Our proposed technique overcomes all three disadvantages of grid search. We demonstrate effectiveness on image classification tasks on several datasets, yielding heldout accuracy comparable to existing approaches with far less compute time.
翻译:许多流行的迁移学习方法依赖网格搜索来选择控制过拟合的正则化超参数。这种网格搜索要求存在几个关键缺陷:搜索计算成本高昂、需要划分验证集从而减少模型训练可用数据量,且要求实践者指定候选值。本文提出一种替代网格搜索的方法:通过基于变分方法中证据下界目标的模型选择技术,直接在完整训练集上学习正则化超参数。对于具有数百万参数的深度神经网络,我们特别推荐一种改进的证据下界目标,该目标在保持贝叶斯模型选择证据有效下界的前提下,提升了数据似然相对于先验的权重。我们提出的技术克服了网格搜索的所有三个缺陷。我们在多个数据集的图像分类任务上验证了方法的有效性,结果表明在计算时间大幅减少的情况下,所获验证集准确率与现有方法相当。