On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

翻译：小样本学习旨在学习能够利用少量样本处理新任务的表征。近期研究表明，跨模态学习可以提升小样本分类的表征能力。具体而言，语言作为一种丰富的模态可用于指导视觉学习。本研究通过多模态架构进行小样本学习实验，该架构包含三个组件：分类器、辅助网络和桥接网络。分类器执行主分类任务，辅助网络学习从相同输入预测语言表征，而桥接网络则通过条件批归一化将辅助网络的高层特征转换为小样本分类器各层的调制参数。该桥接机制应能促进语言与视觉之间的轻量级语义对齐，从而提升分类器性能。然而，在对两个主流小样本分类基准进行评估后，我们发现：a) 性能提升在不同基准间无法复现；b) 当出现提升时，其实际源于桥接网络引入的额外计算量与参数。本研究为多模态元学习（特别是使用语言表征时）的未来工作提供了见解与建议。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日