Hypernetwork approach to Bayesian MAML

The main goal of Few-Shot learning algorithms is to enable learning from small amounts of data. One of the most popular and elegant Few-Shot learning approaches is Model-Agnostic Meta-Learning (MAML). The main idea behind this method is to learn the shared universal weights of a meta-model, which are then adapted for specific tasks. However, the method suffers from over-fitting and poorly quantifies uncertainty due to limited data size. Bayesian approaches could, in principle, alleviate these shortcomings by learning weight distributions in place of point-wise weights. Unfortunately, previous modifications of MAML are limited due to the simplicity of Gaussian posteriors, MAML-like gradient-based weight updates, or by the same structure enforced for universal and adapted weights. In this paper, we propose a novel framework for Bayesian MAML called BayesianHMAML, which employs Hypernetworks for weight updates. It learns the universal weights point-wise, but a probabilistic structure is added when adapted for specific tasks. In such a framework, we can use simple Gaussian distributions or more complicated posteriors induced by Continuous Normalizing Flows.

翻译：少样本学习算法的主要目标是实现从少量数据中学习。最流行且优雅的少样本学习方法之一是模型无关元学习（MAML）。该方法的核心思想是学习元模型的共享通用权重，随后针对特定任务对这些权重进行适配。然而，由于数据规模有限，该方法存在过拟合问题且难以有效量化不确定性。原则上，贝叶斯方法可通过学习权重分布替代逐点权重来缓解这些缺陷。但遗憾的是，由于高斯后验分布的简单性、类似MAML的梯度更新方式，以及对通用权重与适配权重采用相同的结构约束，现有MAML的改进方案存在局限。本文提出一种名为BayesianHMAML的新型贝叶斯MAML框架，该框架采用超网络进行权重更新。在此框架中，通用权重以逐点方式学习，但在适配特定任务时引入概率结构。我们既可采用简单的高斯分布，也可使用连续归一化流诱导的更复杂后验分布。

相关内容

MAML

关注 42

MAML（Model-Agnostic Meta-Learning）是元学习（Meta learning）最经典的几个算法之一，出自论文《Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks》。原文地址：https://arxiv.org/abs/1703.03400

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日