Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
翻译:赋予深度模型小样本学习能力(即仅从少量样本中快速学习)是人工智能领域的核心挑战。基于梯度的元学习方法通过"学会如何学习新任务"有效应对了该挑战。其核心思想是以双层优化方式训练深度模型:外层循环学习共享的梯度下降算法(即超参数),内层循环则利用该算法通过少量标注数据优化任务特定模型。尽管现有方法表现出优越性能,但外层循环需沿内层优化路径计算二阶导数,这带来了显著的内存负担与梯度消失风险。受扩散模型最新进展启发,我们发现内层梯度下降过程可被视作扩散过程的逆过程(即去噪)——其去噪目标为模型权重而非原始数据。基于此事实,本文提出将梯度下降优化器建模为扩散模型,并由此构建新型任务条件扩散元学习框架MetaDiff——该框架以去噪方式将模型权重从高斯噪声有效优化至目标权重。得益于扩散模型的训练高效性,MetaDiff无需通过内层路径进行微分,从而有效缓解内存负担与梯度消失风险。实验结果表明,MetaDiff在小样本学习任务中性能超越当前最先进的基于梯度的元学习系列方法。