Due to its empirical success in few-shot classification and reinforcement learning, meta-learning has recently received significant interest. Meta-learning methods leverage data from previous tasks to learn a new task in a sample-efficient manner. In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods perform well by learning shared representations during pretraining, there is limited theoretical evidence of such behavior. More importantly, it has not been rigorously shown that these methods still learn a shared structure, despite architectural misspecifications. In this direction, this work shows, in the limit of an infinite number of tasks, that first-order ANIL with a linear two-layer network architecture successfully learns linear shared representations. This result even holds with a misspecified network parameterisation; having a width larger than the dimension of the shared representations results in an asymptotically low-rank solution. The learnt solution then yields a good adaptation performance on any new task after a single gradient step. Overall this illustrates how well model-agnostic methods such as first-order ANIL can learn shared representations.
翻译:由于其在少样本分类和强化学习中的经验成功,元学习近期受到了广泛关注。元学习方法利用先前任务的数据,以样本高效的方式学习新任务。特别地,模型无关方法寻找初始化点,使得梯度下降能够快速适应任意新任务。尽管经验表明此类方法通过预训练期间学习共享表示而表现良好,但对此行为的理论证据仍然有限。更重要的是,目前尚未严格证明这些方法在架构错配情况下仍能学习共享结构。为此,本研究在任务数量趋于无穷的极限条件下证明,采用线性两层网络架构的一阶ANIL能够成功学习线性共享表示。即使在网络参数化存在错配时——即网络宽度大于共享表示维度——这一结论依然成立,此时会渐近地产生低秩解。该学习解在单步梯度更新后即可在新任务上实现良好的适应性能。整体而言,本研究阐明了模型无关方法(如一阶ANIL)在学习共享表示方面的强大能力。