As fine-tuning becomes impractical at scale, probing is emerging as the preferred evaluation protocol. However, standard linear probing can understate the capability of models whose pre-training optimizes local representations rather than an explicit global representation. This motivates attentive probing, an alternative that uses attention to selectively aggregate patch-level features. Despite growing adoption, attentive probing is still underexplored: existing approaches are often over-parameterized and computationally inefficient. In this work, we revisit attentive probing through the lens of the accuracy vs. parameter-efficiency trade-off. We present the first comprehensive study of existing methods, analyzing their design choices and benchmarking their performance. Building on these insights, we propose efficient probing (EP), a lightweight yet effective multi-query cross-attention mechanism that eliminates redundant projections and reduces the number of trainable parameters. Across multiple benchmarks and pre-training paradigms, EP consistently outperforms linear probing and previous attentive probing methods, and remains effective when combined with parameter-efficient fine-tuning. Beyond evaluation, our analysis uncovers emerging properties of EP, including complementary attention maps, which open new directions for leveraging probing beyond protocol design. Project page: https://vrg.fel.cvut.cz/ep/.
翻译:随着微调在大规模应用中的不可行性日益凸显,探针方法正逐渐成为首选的评估方案。然而,标准的线性探针可能会低估那些预训练优化了局部表征而非显式全局表征的模型能力。这促使了注意力探针方法的出现,它作为一种替代方案,利用注意力机制选择性地聚合图像块级别的特征。尽管应用日益广泛,注意力探针方法仍未得到充分探索:现有方法往往参数过多且计算效率低下。在本研究中,我们从准确性与参数效率权衡的视角重新审视注意力探针方法。我们首次对现有方法进行了全面研究,分析了它们的设计选择并对性能进行了基准测试。基于这些洞见,我们提出了高效探针方法,这是一种轻量级但有效的多查询交叉注意力机制,它消除了冗余的投影并减少了可训练参数的数量。在多个基准测试和预训练范式中,高效探针方法始终优于线性探针和先前的注意力探针方法,并且在与参数高效微调结合时依然有效。除了评估之外,我们的分析揭示了高效探针方法的新兴特性,包括互补的注意力图,这为在方案设计之外利用探针方法开辟了新的方向。项目页面:https://vrg.fel.cvut.cz/ep/。