We pursue transfer learning to improve classifier accuracy on a target task with few labeled examples available for training. Recent work suggests that using a source task to learn a prior distribution over neural net weights, not just an initialization, can boost target task performance. In this study, we carefully compare transfer learning with and without source task informed priors across 5 datasets. We find that standard transfer learning informed by an initialization only performs far better than reported in previous comparisons. The relative gains of methods using informative priors over standard transfer learning vary in magnitude across datasets. For the scenario of 5-300 examples per class, we find negative or negligible gains on 2 datasets, modest gains (between 1.5-3 points of accuracy) on 2 other datasets, and substantial gains (>8 points) on one dataset. Among methods using informative priors, we find that an isotropic covariance appears competitive with learned low-rank covariance matrix while being substantially simpler to understand and tune. Further analysis suggests that the mechanistic justification for informed priors -- hypothesized improved alignment between train and test loss landscapes -- is not consistently supported due to high variability in empirical landscapes. We release code to allow independent reproduction of all experiments.
翻译:我们探索迁移学习以提升分类器在目标任务上的准确率,该任务仅有少量标注样本可用于训练。近期研究表明,利用源任务学习神经网络权重的先验分布(而非仅初始化)可提升目标任务性能。本研究系统比较了使用与不使用源任务信息先验的迁移学习方法在5个数据集上的表现。我们发现,仅基于初始化的标准迁移学习性能远超先前比较研究中报道的结果。采用信息先验的方法相对于标准迁移学习的相对增益在不同数据集上存在显著差异:在每类5-300个样本的场景下,2个数据集呈现负增益或可忽略增益,2个数据集呈现中等增益(准确率提升1.5-3个百分点),1个数据集呈现显著增益(>8个百分点)。在采用信息先验的方法中,各向同性协方差矩阵与学习得到的低秩协方差矩阵表现相当,同时具有更简明的理论解释和调参流程。进一步分析表明,信息先验的机制合理性——即假设训练与测试损失函数地形图对齐度提升——因经验地形图的高变异性而未得到一致性支持。我们公开实验代码以确保所有结果可独立复现。