Few-shot learning for neural networks (NNs) is an important problem that aims to train NNs with a few data. The main challenge is how to avoid overfitting since over-parameterized NNs can easily overfit to such small dataset. Previous work (e.g. MAML by Finn et al. 2017) tackles this challenge by meta-learning, which learns how to learn from a few data by using various tasks. On the other hand, one conventional approach to avoid overfitting is restricting hypothesis spaces by endowing sparse NN structures like convolution layers in computer vision. However, although such manually-designed sparse structures are sample-efficient for sufficiently large datasets, they are still insufficient for few-shot learning. Then the following questions naturally arise: (1) Can we find sparse structures effective for few-shot learning by meta-learning? (2) What benefits will it bring in terms of meta-generalization? In this work, we propose a novel meta-learning approach, called Meta-ticket, to find optimal sparse subnetworks for few-shot learning within randomly initialized NNs. We empirically validated that Meta-ticket successfully discover sparse subnetworks that can learn specialized features for each given task. Due to this task-wise adaptation ability, Meta-ticket achieves superior meta-generalization compared to MAML-based methods especially with large NNs. The code is available at: https://github.com/dchiji-ntt/meta-ticket
翻译:针对神经网络的少样本学习是一个重要问题,其目标是用少量数据训练神经网络。主要挑战在于如何避免过拟合,因为过参数化的神经网络容易在这样的小数据集上出现过拟合。先前的研究(例如Finn等人2017年提出的MAML)通过元学习来应对这一挑战,即利用多种任务学习如何从少量数据中进行学习。另一方面,避免过拟合的一种传统方法是通过赋予稀疏神经网络结构(如计算机视觉中的卷积层)来限制假设空间。然而,尽管此类人工设计的稀疏结构在数据集足够大时具有样本高效性,但对于少样本学习而言仍显不足。由此自然产生以下问题:(1)我们能否通过元学习找到对少样本学习有效的稀疏结构?(2)这将对元泛化带来哪些益处?在本工作中,我们提出了一种新颖的元学习方法,称为Meta-ticket,用于在随机初始化的神经网络中寻找适用于少样本学习的最优稀疏子网络。我们通过实验验证了Meta-ticket能够成功发现可针对每个给定任务学习专门特征的稀疏子网络。由于这种任务自适应能力,Meta-ticket在元泛化方面相比基于MAML的方法表现更优,尤其在大型神经网络上效果显著。代码见:https://github.com/dchiji-ntt/meta-ticket