A Hierarchical Bayesian Model for Deep Few-Shot Meta Learning

We propose a novel hierarchical Bayesian model for learning with a large (possibly infinite) number of tasks/episodes, which suits well the few-shot meta learning problem. We consider episode-wise random variables to model episode-specific target generative processes, where these local random variables are governed by a higher-level global random variate. The global variable helps memorize the important information from historic episodes while controlling how much the model needs to be adapted to new episodes in a principled Bayesian manner. Within our model framework, the prediction on a novel episode/task can be seen as a Bayesian inference problem. However, a main obstacle in learning with a large/infinite number of local random variables in online nature, is that one is not allowed to store the posterior distribution of the current local random variable for frequent future updates, typical in conventional variational inference. We need to be able to treat each local variable as a one-time iterate in the optimization. We propose a Normal-Inverse-Wishart model, for which we show that this one-time iterate optimization becomes feasible due to the approximate closed-form solutions for the local posterior distributions. The resulting algorithm is more attractive than the MAML in that it is not required to maintain computational graphs for the whole gradient optimization steps per episode. Our approach is also different from existing Bayesian meta learning methods in that unlike dealing with a single random variable for the whole episodes, our approach has a hierarchical structure that allows one-time episodic optimization, desirable for principled Bayesian learning with many/infinite tasks. The code is available at \url{https://github.com/minyoungkim21/niwmeta}.

翻译：我们提出了一种新颖的层次化贝叶斯模型，用于处理大规模（可能无限）任务/情节的学习问题，该模型非常适合小样本元学习场景。我们引入情节级随机变量来建模特定情节的生成过程，这些局部随机变量由更高层次的全局随机变量控制。全局变量以贝叶斯原理的方式帮助记忆历史情节中的重要信息，同时控制模型需要适应新情节的程度。在我们的模型框架中，对新情节/任务的预测可视为贝叶斯推理问题。然而，在线学习中处理大规模/无限数量的局部随机变量的主要障碍是：我们无法像传统变分推理那样存储当前局部随机变量的后验分布以供频繁的未来更新。我们需要将每个局部变量视为优化过程中的一次性迭代变量。为此，我们提出了一种正态逆威沙特模型，并证明在该模型下，由于局部后验分布具有近似闭式解，这种一次性迭代优化变得可行。与MAML相比，所提算法的优势在于无需为每轮情节维护整个梯度优化步骤的计算图。我们的方法也不同于现有贝叶斯元学习方法——后者将整个情节集视为单一随机变量，而我们的层次化结构支持情节级一次性优化，这对于处理大量/无限任务的原理性贝叶斯学习更为理想。代码开源地址：\url{https://github.com/minyoungkim21/niwmeta}。