Rethinking Global Average Pooling: Your Classifier Is Secretly a Multi-Instance Learner

Modern image classifiers widely adopt global average pooling (GAP) followed by a linear classification head. This linearity ensures that the image-level logits equal the average of logits obtained by applying the classification head pointwise to the feature grid prior to GAP. Consequently, standard classifiers may inherently retain spatial class evidence that remains recoverable even when the image-level prediction is incorrect. This structure naturally suggests a multiple-instance learning (MIL) interpretation, where an image is viewed as a bag of spatial instances. Within this formulation, we demonstrate that standard classifiers trained with a single label per image can still learn the intended classification task in multi-object scenes. We further exploit this property to decompose image-level logits into a prediction grid, providing a post-hoc diagnostic to extract spatial class evidence that GAP otherwise obscures. Our systematic evaluation reveals that off-the-shelf models consistently recover the ground-truth class within foreground regions. The MIL interpretation further suggests that common classifier failures reflect known limitations of mean aggregation.

翻译：现代图像分类器普遍采用全局平均池化后接线性分类头。这种线性结构确保了图像级逻辑值等于将分类头逐点应用于全局平均池化前的特征图所获逻辑值的平均值。因此，标准分类器可能内在地保留了空间类别证据——即使图像级预测错误，这些证据仍可恢复。这种结构自然引出多实例学习的解释：图像可被视为由空间实例组成的包。在此框架下，我们证明使用单标签训练的标-准分类器仍能在多物体场景中学习到预期的分类任务。我们进一步利用这一特性将图像级逻辑值分解为预测网格，提供了一种事后诊断方法来提取被全局平均池化掩盖的空间类别证据。系统评估表明，现成模型能够在前景区域中持续恢复真实类别。多实例学习视角进一步揭示，常见的分类器失败反映了均值聚合的已知局限性。

相关内容

分类器

关注 6

分类是数据挖掘的一种非常重要的方法。分类的概念是在已有数据的基础上学会一个分类函数或构造出一个分类模型（即我们通常所说的分类器(Classifier)）。该函数或模型能够把数据库中的数据纪录映射到给定类别中的某一个，从而可以应用于数据预测。总之，分类器是数据挖掘中对样本进行分类的方法的统称，包含决策树、逻辑回归、朴素贝叶斯、神经网络等算法。

【CVPR2024】生成式多模态模型是优秀的类增量学习器

专知会员服务

32+阅读 · 2024年3月28日

【深度迁移学习在图像分类中的应用综述】Deep transfer learning for image classification: a survey

专知会员服务

25+阅读 · 2022年5月24日