Equipping a deep model the abaility of few-shot learning, i.e., learning quickly from only few examples, is a core challenge for artificial intelligence. Gradient-based meta-learning approaches effectively address the challenge by learning how to learn novel tasks. Its key idea is learning a deep model in a bi-level optimization manner, where the outer-loop process learns a shared gradient descent algorithm (i.e., its hyperparameters), while the inner-loop process leverage it to optimize a task-specific model by using only few labeled data. Although these existing methods have shown superior performance, the outer-loop process requires calculating second-order derivatives along the inner optimization path, which imposes considerable memory burdens and the risk of vanishing gradients. Drawing inspiration from recent progress of diffusion models, we find that the inner-loop gradient descent process can be actually viewed as a reverse process (i.e., denoising) of diffusion where the target of denoising is model weights but the origin data. Based on this fact, in this paper, we propose to model the gradient descent optimizer as a diffusion model and then present a novel task-conditional diffusion-based meta-learning, called MetaDiff, that effectively models the optimization process of model weights from Gaussion noises to target weights in a denoising manner. Thanks to the training efficiency of diffusion models, our MetaDiff do not need to differentiate through the inner-loop path such that the memory burdens and the risk of vanishing gradients can be effectvely alleviated. Experiment results show that our MetaDiff outperforms the state-of-the-art gradient-based meta-learning family in few-shot learning tasks.
翻译:摘要:赋予深度模型小样本学习能力(即仅从少量样本中快速学习)是人工智能领域的核心挑战。基于梯度的元学习方法通过“学习如何学习新任务”有效应对了这一挑战。其核心思想是以双层优化方式训练深度模型:外层循环学习共享的梯度下降算法(即其超参数),而内层循环利用该算法仅通过少量标注数据优化特定任务的模型。尽管现有方法已展现卓越性能,但外层循环需沿内层优化路径计算二阶导数,带来了显著的内存负担及梯度消失风险。受扩散模型最新进展启发,我们发现内层梯度下降过程实际上可被视为扩散的逆向过程(即去噪),其中去噪目标为模型权重而非原始数据。基于此,本文提出将梯度下降优化器建模为扩散模型,并设计一种新颖的条件扩散元学习方法——MetaDiff。该方法通过去噪方式有效将模型权重从高斯噪声优化至目标权重。得益于扩散模型的训练效率,MetaDiff无需通过内层路径进行微分,从而显著缓解了内存负担与梯度消失风险。实验结果表明,在小样本学习任务中,MetaDiff性能优于当前最先进的基于梯度的元学习方法。