Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialization points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with overparametrization; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learned solution then yields a good adaptation performance on any new task after a single gradient step. Overall, this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.
翻译:由于其在小样本分类和强化学习中的实证成功,元学习最近受到广泛关注。元学习方法利用先前任务的数据,以样本高效的方式学习新任务。特别是,模型无关方法寻找能够使梯度下降快速适应任何新任务的初始化点。尽管实证研究表明此类方法通过预训练学习共享表示而表现良好,但对此行为的理论证据有限。更重要的是,尚未证明这些方法在架构误设的情况下仍能学习共享结构。为此,本工作在无限任务数量的极限条件下证明,采用线性双层网络架构的一阶ANIL能够成功学习线性共享表示。该结果甚至在过参数化条件下依然成立:当网络宽度大于共享表示维度时,解具有渐近低秩特性。学习到的解随后可在单步梯度更新后,在任何新任务上获得良好的适应性能。总体而言,这阐明了如一阶ANIL等模型无关方法学习共享表示的有效机制。