On the ERM Principle in Meta-Learning

Classic supervised learning involves algorithms trained on $n$ labeled examples to produce a hypothesis $h \in \mathcal{H}$ aimed at performing well on unseen examples. Meta-learning extends this by training across $n$ tasks, with $m$ examples per task, producing a hypothesis class $\mathcal{H}$ within some meta-class $\mathbb{H}$. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of $n$ (number of tasks) and $m$ (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either $m$ or $n$ tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either $m$ must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as $n$ goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of $\varepsilon$ and identify for each $\varepsilon$ how many examples per task are needed to achieve an error of $\varepsilon$ in the limit as the number of tasks $n$ goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

翻译：经典监督学习涉及在 $n$ 个标注样本上训练算法以产生一个假设 $h \in \mathcal{H}$，其目标是在未见样本上表现良好。元学习通过跨 $n$ 个任务进行训练来扩展这一范式，每个任务包含 $m$ 个样本，从而在某个元类 $\mathbb{H}$ 内产生一个假设类 $\mathcal{H}$。这种设定适用于许多现代问题，例如上下文学习、超网络以及学会学习。评估监督学习算法性能的一种常用方法是分析其学习曲线，该曲线描述了期望误差随训练样本数量的变化关系。在元学习中，学习曲线扩展为一个二维学习曲面，该曲面评估了在不同 $n$（任务数量）和 $m$（每个任务的训练样本数量）取值下，在未见领域上的期望误差。我们的研究刻画了当 $m$ 或 $n$ 趋于无穷时，元经验风险最小化器的无分布学习曲面特性：我们证明任务数量必须与期望误差成反比增长。相比之下，我们发现样本数量表现出截然不同的行为：它满足一种二分性，即每个元类都符合以下条件之一：(i) 要么 $m$ 必须与误差成反比增长，或者 (ii) 当 $n$ 趋于无穷时，每个任务仅需有限数量的样本就足以使误差消失。这一发现阐明并刻画了每个任务仅需少量样本即可成功学习的情形。我们进一步针对正 $\varepsilon$ 值细化了这一结论，并确定了对于每个 $\varepsilon$，在任务数量 $n$ 趋于无穷的极限下，每个任务需要多少样本才能达到 $\varepsilon$ 的误差。我们通过建立一个使用每个领域有限数量样本进行元学习的充分必要条件来实现这一目标。