Due to its empirical success on few shot classification and reinforcement learning, meta-learning recently received a lot of interest. Meta-learning leverages data from previous tasks to quickly learn a new task, despite limited data. In particular, model agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task. Although it has been empirically suggested that such methods learn a good shared representation during training, there is no strong theoretical evidence of such behavior. More importantly, it is unclear whether these methods truly are model agnostic, i.e., whether they still learn a shared structure despite architecture misspecifications. To fill this gap, this work shows in the limit of an infinite number of tasks that first order ANIL with a linear two-layer network architecture successfully learns a linear shared representation. Moreover, this result holds despite misspecifications: having a large width with respect to the hidden dimension of the shared representation does not harm the algorithm performance. The learnt parameters then allow to get a small test loss after a single gradient step on any new task. Overall this illustrates how well model agnostic methods can adapt to any (unknown) model structure.
翻译:由于其在小样本分类和强化学习中的实证成功,元学习近期受到广泛关注。元学习利用先前任务的数据来快速学习新任务,尽管数据有限。特别地,模型无关方法寻找能通过梯度下降快速适应任意新任务的初始化点。尽管已有经验证据表明此类方法在训练过程中能学到良好的共享表示,但缺乏强有力的理论证明该行为。更为关键的是,尚不清楚这些方法是否真正做到了模型无关——即当架构设定存在错误时,它们是否仍能学习共享结构。为填补这一空白,本研究证明:在无限任务极限下,采用线性两层网络架构的一阶ANIL方法能够成功学习线性共享表示。此外,即使存在设定错误——相比共享表示的隐藏维度采用过大宽度——该结论依然成立,不会损害算法性能。所学习的参数使得在任意新任务上仅需一步梯度更新即可获得较小的测试损失。这整体上阐释了模型无关方法如何能够适应任意(未知)模型结构。