Recently, efficient fine-tuning of large-scale pre-trained models has attracted increasing research interests, where linear probing (LP) as a fundamental module is involved in exploiting the final representations for task-dependent classification. However, most of the existing methods focus on how to effectively introduce a few of learnable parameters, and little work pays attention to the commonly used LP module. In this paper, we propose a novel Moment Probing (MP) method to further explore the potential of LP. Distinguished from LP which builds a linear classification head based on the mean of final features (e.g., word tokens for ViT) or classification tokens, our MP performs a linear classifier on feature distribution, which provides the stronger representation ability by exploiting richer statistical information inherent in features. Specifically, we represent feature distribution by its characteristic function, which is efficiently approximated by using first- and second-order moments of features. Furthermore, we propose a multi-head convolutional cross-covariance (MHC$^3$) to compute second-order moments in an efficient and effective manner. By considering that MP could affect feature learning, we introduce a partially shared module to learn two recalibrating parameters (PSRP) for backbones based on MP, namely MP$_{+}$. Extensive experiments on ten benchmarks using various models show that our MP significantly outperforms LP and is competitive with counterparts at less training cost, while our MP$_{+}$ achieves state-of-the-art performance.
翻译:近期,大规模预训练模型的高效微调引起了越来越多的研究兴趣,其中线性探针作为基础模块被用于利用最终表示进行任务依赖的分类。然而,现有方法大多关注如何有效引入少量可学习参数,很少有研究关注常用的线性探针模块本身。本文提出了一种新颖的矩探针方法,进一步挖掘线性探针的潜力。与基于最终特征均值或分类标记构建线性分类头的线性探针不同,我们的矩探针在特征分布上执行线性分类器,通过利用特征中蕴含的更丰富的统计信息来提供更强的表示能力。具体而言,我们通过特征的特征函数来表征特征分布,该函数可通过使用特征的一阶和二阶矩进行高效近似。此外,我们提出了一种多头卷积交叉协方差方法,以高效的方式计算二阶矩。考虑到矩探针可能影响特征学习,我们引入了一个部分共享模块,基于矩探针为骨干网络学习两个重新校准参数,即MP$_{+}$。在十个基准数据集上使用多种模型的广泛实验表明,我们的矩探针显著优于线性探针,并以更少的训练成本与同类方法具有竞争力,而我们的MP$_{+}$实现了最先进的性能。