PixelCAM：用于组织学图像分类与感兴趣区域定位的像素级类别激活映射 (PixelCAM: Pixel Class Activation Mapping for Histology Image Classification and ROI Localization)

Weakly supervised object localization (WSOL) methods allow training models to classify images and localize ROIs. WSOL only requires low-cost image-class annotations yet provides a visually interpretable classifier, which is important in histology image analysis. Standard WSOL methods rely on class activation mapping (CAM) methods to produce spatial localization maps according to a single- or two-step strategy. While both strategies have made significant progress, they still face several limitations with histology images. Single-step methods can easily result in under- or over-activation due to the limited visual ROI saliency in histology images and the limited localization cues. They also face the well-known issue of asynchronous convergence between classification and localization tasks. The two-step approach is sub-optimal because it is tied to a frozen classifier, limiting the capacity for localization. Moreover, these methods also struggle when applied to out-of-distribution (OOD) datasets. In this paper, a multi-task approach for WSOL is introduced for simultaneous training of both tasks to address the asynchronous convergence problem. In particular, localization is performed in the pixel-feature space of an image encoder that is shared with classification. This allows learning discriminant features and accurate delineation of foreground/background regions to support ROI localization and image classification. We propose PixelCAM, a cost-effective foreground/background pixel-wise classifier in the pixel-feature space that allows for spatial object localization. PixelCAM is trained using pixel pseudo-labels collected from a pretrained WSOL model. Both image and pixel-wise classifiers are trained simultaneously using standard gradient descent. In addition, our pixel classifier can easily be integrated into CNN- and transformer-based architectures without any modifications.

翻译：弱监督目标定位（WSOL）方法能够训练模型对图像进行分类并定位感兴趣区域（ROI）。WSOL仅需低成本的图像类别标注即可提供视觉可解释的分类器，这在组织学图像分析中具有重要意义。标准的WSOL方法依赖类别激活映射（CAM）方法，通过单阶段或两阶段策略生成空间定位图。尽管两种策略均已取得显著进展，但在处理组织学图像时仍存在若干局限：单阶段方法因组织学图像中视觉ROI显著性有限及定位线索不足，易产生激活不足或过度激活问题，同时面临分类与定位任务收敛不同步的经典难题；两阶段方法因受限于固定的分类器，定位能力存在理论上限。此外，现有方法在分布外（OOD）数据集上的表现亦不理想。本文提出一种多任务WSOL方法，通过同步训练双任务以解决收敛不同步问题。该方法在分类任务共享的图像编码器像素特征空间中进行定位，从而学习判别性特征并精确划分前景/背景区域以支持ROI定位与图像分类。我们提出PixelCAM——一种在像素特征空间中构建的高效前景/背景像素级分类器，可实现空间目标定位。PixelCAM使用预训练WSOL模型生成的像素伪标签进行训练，图像分类器与像素分类器通过标准梯度下降法同步优化。该像素分类器无需任何修改即可无缝集成至CNN与Transformer架构中。