Understanding Transfer Learning and Gradient-Based Meta-Learning Techniques

Deep neural networks can yield good performance on various tasks but often require large amounts of data to train them. Meta-learning received considerable attention as one approach to improve the generalization of these networks from a limited amount of data. Whilst meta-learning techniques have been observed to be successful at this in various scenarios, recent results suggest that when evaluated on tasks from a different data distribution than the one used for training, a baseline that simply finetunes a pre-trained network may be more effective than more complicated meta-learning techniques such as MAML, which is one of the most popular meta-learning techniques. This is surprising as the learning behaviour of MAML mimics that of finetuning: both rely on re-using learned features. We investigate the observed performance differences between finetuning, MAML, and another meta-learning technique called Reptile, and show that MAML and Reptile specialize for fast adaptation in low-data regimes of similar data distribution as the one used for training. Our findings show that both the output layer and the noisy training conditions induced by data scarcity play important roles in facilitating this specialization for MAML. Lastly, we show that the pre-trained features as obtained by the finetuning baseline are more diverse and discriminative than those learned by MAML and Reptile. Due to this lack of diversity and distribution specialization, MAML and Reptile may fail to generalize to out-of-distribution tasks whereas finetuning can fall back on the diversity of the learned features.

翻译：深度神经网络在各种任务上能够取得良好性能，但通常需要大量数据来训练。元学习作为一种从有限数据中提升这些网络泛化能力的方法得到了广泛关注。尽管元学习技术在各种场景下已被观察到能成功实现这一目标，但最新研究结果表明，当评估来自与训练数据分布不同的任务时，仅对预训练网络进行微调的基线方法可能比更复杂的元学习技术（如最流行的方法之一MAML）更有效。这一现象令人意外，因为MAML的学习行为与微调相似：两者都依赖于复用已学特征。我们探究了微调、MAML以及另一种名为Reptile的元学习技术之间的性能差异，并证明MAML和Reptile专门适用于在与训练数据分布相似的低数据场景中进行快速适应。我们的发现表明，输出层和由数据稀缺引发的噪声训练条件在促进MAML的此类专门化过程中起着重要作用。最后，我们证明通过微调基线获得的预训练特征比MAML和Reptile学到的特征更具多样性和判别性。由于缺乏这种多样性及分布专门化特性，MAML和Reptile可能难以泛化到分布外任务，而微调则能依赖所学特征的多样性来应对此类情况。

相关内容

MAML

关注 42

MAML（Model-Agnostic Meta-Learning）是元学习（Meta learning）最经典的几个算法之一，出自论文《Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks》。原文地址：https://arxiv.org/abs/1703.03400

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日