Less is More: On the Feature Redundancy of Pretrained Models When Transferring to Few-shot Tasks

Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.

翻译：将预训练模型迁移至下游任务时，一种简便方法是利用目标数据进行线性探针（即基于预训练模型提取的冻结特征训练线性分类器）。由于预训练与下游数据集之间可能存在显著差异，我们不禁要问：预训练特征的所有维度是否对给定的下游任务都有用？研究表明，在线性探针场景下，当下游数据稀缺（即小样本情况）时，预训练特征可能呈现出高度冗余性。例如在5-way 1-shot任务中，仅使用1%的最重要特征维度就能恢复完整表征的性能表现。有趣的是，大多数特征维度仅在小样本设置下呈现冗余，随着样本数量增加会逐渐变得有用，这表明特征冗余或许是刻画"小样本"迁移问题本质的关键。我们为这一现象提供了理论解释，并揭示了高方差且类中心距离小的特征维度如何在小样本设置下成为严重干扰分类结果的混淆因素。为解决该问题，我们发现少量训练样本难以精确识别冗余特征，但可通过基于估计特征重要性构建的软掩码来调整特征幅度。实验证明，该方法能普遍提升多种预训练模型在下游数据集上的小样本迁移性能。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日