Theoretical Characterization of the Generalization Performance of Overfitted Meta-Learning

Meta-learning has arisen as a successful method for improving training performance by training over many similar tasks, especially with deep neural networks (DNNs). However, the theoretical understanding of when and why overparameterized models such as DNNs can generalize well in meta-learning is still limited. As an initial step towards addressing this challenge, this paper studies the generalization performance of overfitted meta-learning under a linear regression model with Gaussian features. In contrast to a few recent studies along the same line, our framework allows the number of model parameters to be arbitrarily larger than the number of features in the ground truth signal, and hence naturally captures the overparameterized regime in practical deep meta-learning. We show that the overfitted min $\ell_2$-norm solution of model-agnostic meta-learning (MAML) can be beneficial, which is similar to the recent remarkable findings on ``benign overfitting'' and ``double descent'' phenomenon in the classical (single-task) linear regression. However, due to the uniqueness of meta-learning such as task-specific gradient descent inner training and the diversity/fluctuation of the ground-truth signals among training tasks, we find new and interesting properties that do not exist in single-task linear regression. We first provide a high-probability upper bound (under reasonable tightness) on the generalization error, where certain terms decrease when the number of features increases. Our analysis suggests that benign overfitting is more significant and easier to observe when the noise and the diversity/fluctuation of the ground truth of each training task are large. Under this circumstance, we show that the overfitted min $\ell_2$-norm solution can achieve an even lower generalization error than the underparameterized solution.

翻译：元学习通过训练大量相似任务（特别是在深度神经网络中）已成为提升训练性能的成功方法。然而，对于过参数化模型（如深度神经网络）在元学习中何时及为何能实现良好泛化的理论理解仍然有限。作为解决这一挑战的初步步骤，本文研究了高斯特征线性回归模型下过度拟合元学习的泛化性能。与近期几项同类研究不同，我们的框架允许模型参数数量任意大于真实信号中的特征数量，从而自然捕获实际深度元学习中的过参数化场景。研究表明，模型无关元学习（MAML）中最小$\ell_2$范数的过度拟合解可能具有优势，这与经典（单任务）线性回归中近期引人注目的“良性过拟合”和“双重下降”现象发现相似。然而，由于元学习的独特性质（如任务特定梯度下降内部训练及训练任务间真实信号的多样性/波动性），我们发现了单任务线性回归中不存在的新颖有趣特性。首先给出泛化误差的高概率上界（在合理紧性条件下），其中某些项随特征数量增加而递减。分析表明，当噪声及每个训练任务真实信号的多样性/波动性较大时，良性过拟合更为显著且更易被观测。在此条件下，过度拟合的最小$\ell_2$范数解甚至能实现比欠参数化解更低的泛化误差。