Memory-Reduced Meta-Learning with Guaranteed Convergence

The optimization-based meta-learning approach is gaining increased traction because of its unique ability to quickly adapt to a new task using only small amounts of data. However, existing optimization-based meta-learning approaches, such as MAML, ANIL and their variants, generally employ backpropagation for upper-level gradient estimation, which requires using historical lower-level parameters/gradients and thus increases computational and memory overhead in each iteration. In this paper, we propose a meta-learning algorithm that can avoid using historical parameters/gradients and significantly reduce memory costs in each iteration compared to existing optimization-based meta-learning approaches. In addition to memory reduction, we prove that our proposed algorithm converges sublinearly with the iteration number of upper-level optimization, and the convergence error decays sublinearly with the batch size of sampled tasks. In the specific case in terms of deterministic meta-learning, we also prove that our proposed algorithm converges to an exact solution. Moreover, we quantify that the computational complexity of the algorithm is on the order of $\mathcal{O}(\epsilon^{-1})$, which matches existing convergence results on meta-learning even without using any historical parameters/gradients. Experimental results on meta-learning benchmarks confirm the efficacy of our proposed algorithm.

翻译：基于优化的元学习方法因其仅需少量数据即可快速适应新任务的独特能力而日益受到关注。然而，现有的基于优化的元学习方法（如MAML、ANIL及其变体）通常采用反向传播进行上层梯度估计，这需要使用历史的下层参数/梯度，从而增加了每次迭代的计算和内存开销。本文提出一种元学习算法，与现有基于优化的元学习方法相比，该算法能够避免使用历史参数/梯度，并显著降低每次迭代的内存成本。除内存优化外，我们证明了所提算法在上层优化迭代次数方面具有次线性收敛性，且收敛误差随采样任务批大小的增加而次线性衰减。针对确定性元学习的特定情形，我们还证明了该算法能够收敛至精确解。此外，我们量化了算法的计算复杂度为$\mathcal{O}(\epsilon^{-1})$量级，这与现有元学习收敛性结果相匹配，且无需使用任何历史参数/梯度。在元学习基准测试上的实验结果验证了所提算法的有效性。

相关内容

MAML

关注 42

MAML（Model-Agnostic Meta-Learning）是元学习（Meta learning）最经典的几个算法之一，出自论文《Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks》。原文地址：https://arxiv.org/abs/1703.03400

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日