Transferring a pretrained model to a downstream task can be as easy as conducting linear probing with target data, that is, training a linear classifier upon frozen features extracted from the pretrained model. As there may exist significant gaps between pretraining and downstream datasets, one may ask whether all dimensions of the pretrained features are useful for a given downstream task. We show that, for linear probing, the pretrained features can be extremely redundant when the downstream data is scarce, or few-shot. For some cases such as 5-way 1-shot tasks, using only 1\% of the most important feature dimensions is able to recover the performance achieved by using the full representation. Interestingly, most dimensions are redundant only under few-shot settings and gradually become useful when the number of shots increases, suggesting that feature redundancy may be the key to characterizing the "few-shot" nature of few-shot transfer problems. We give a theoretical understanding of this phenomenon and show how dimensions with high variance and small distance between class centroids can serve as confounding factors that severely disturb classification results under few-shot settings. As an attempt at solving this problem, we find that the redundant features are difficult to identify accurately with a small number of training samples, but we can instead adjust feature magnitude with a soft mask based on estimated feature importance. We show that this method can generally improve few-shot transfer performance across various pretrained models and downstream datasets.
翻译:将预训练模型迁移至下游任务时,一种简便方法是利用目标数据进行线性探针(即基于预训练模型提取的冻结特征训练线性分类器)。由于预训练与下游数据集之间可能存在显著差异,我们不禁要问:预训练特征的所有维度是否对给定的下游任务都有用?研究表明,在线性探针场景下,当下游数据稀缺(即小样本情况)时,预训练特征可能呈现出高度冗余性。例如在5-way 1-shot任务中,仅使用1%的最重要特征维度就能恢复完整表征的性能表现。有趣的是,大多数特征维度仅在小样本设置下呈现冗余,随着样本数量增加会逐渐变得有用,这表明特征冗余或许是刻画"小样本"迁移问题本质的关键。我们为这一现象提供了理论解释,并揭示了高方差且类中心距离小的特征维度如何在小样本设置下成为严重干扰分类结果的混淆因素。为解决该问题,我们发现少量训练样本难以精确识别冗余特征,但可通过基于估计特征重要性构建的软掩码来调整特征幅度。实验证明,该方法能普遍提升多种预训练模型在下游数据集上的小样本迁移性能。