The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such as MAML, ANIL and their variants, generally employ backpropagation for upper-level gradient estimation, which requires using historical lower-level parameters/gradients and thus increases computational and memory overhead in each iteration. In this paper, we propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration compared to existing optimization-based meta-learning approaches. In addition to memory reduction, we prove that our proposed algorithm converges sublinearly with the iteration number of upper-level optimization, and the convergence error decays sublinearly with the batch size of sampled tasks. In the specific case in terms of deterministic meta-learning, we also prove that our proposed algorithm converges to an exact solution. Moreover, we quantify that the computational complexity of the algorithm is on the order of $\mathcal{O}(\epsilon^{-1})$, which matches existing convergence results on meta-learning even without using any historical parameters/gradients. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.
翻译:基于优化的元学习方法因其仅需少量数据即可快速适应新任务的独特能力而日益受到关注。然而,现有的基于优化的元学习方法(如MAML、ANIL及其变体)通常采用反向传播进行上层梯度估计,这需要使用历史的下层参数/梯度,从而增加了每次迭代的计算和内存开销。本文提出一种元学习算法,与现有基于优化的元学习方法相比,该算法能够避免使用历史参数/梯度,并显著降低每次迭代的内存成本。除内存优化外,我们证明了所提算法在上层优化迭代次数方面具有次线性收敛性,且收敛误差随采样任务批大小的增加而次线性衰减。针对确定性元学习的特定情形,我们还证明了该算法能够收敛至精确解。此外,我们量化了算法的计算复杂度为$\mathcal{O}(\epsilon^{-1})$量级,这与现有元学习收敛性结果相匹配,且无需使用任何历史参数/梯度。在元学习基准测试上的实验结果验证了所提算法的有效性。