Learning new tasks by drawing on prior experience gathered from other (related) tasks is a core property of any intelligent system. Gradient-based meta-learning, especially MAML and its variants, has emerged as a viable solution to accomplish this goal. One problem MAML encounters is its computational and memory burdens needed to compute the meta-gradients. We propose a new first-order variant of MAML that we prove converges to a stationary point of the MAML objective, unlike other first-order variants. We also show that the MAML objective does not satisfy the smoothness assumption assumed in previous works; we show instead that its smoothness constant grows with the norm of the meta-gradient, which theoretically suggests the use of normalized or clipped-gradient methods compared to the plain gradient method used in previous works. We validate our theory on a synthetic experiment.
翻译:通过利用从其他(相关)任务中积累的先验经验来学习新任务,是任何智能系统的核心特性。基于梯度的元学习,特别是MAML及其变体,已成为实现这一目标的可行方案。MAML面临的一个问题是计算元梯度所需的计算和内存负担。我们提出了一种新的MAML一阶变体,并证明其能收敛到MAML目标函数的驻点,这一点与其他一阶变体不同。我们还证明了MAML目标函数不满足先前工作中假设的光滑性条件;相反,我们证明了其光滑性常数随元梯度的范数增长,这从理论上表明,与先前工作中使用的普通梯度方法相比,归一化或梯度裁剪方法更具优势。我们在一个合成实验上验证了我们的理论。