Bag-based Multiple Instance Learning (MIL) approaches have emerged as the mainstream methodology for Whole Slide Image (WSI) classification. However, most existing methods adopt a segmented training strategy, which first extracts features using a pre-trained feature extractor and then aggregates these features through MIL. This segmented training approach leads to insufficient collaborative optimization between the feature extraction network and the MIL network, preventing end-to-end joint optimization and thereby limiting the overall performance of the model. Additionally, conventional methods typically extract features from all patches of fixed size, ignoring the multi-scale observation characteristics of pathologists. This not only results in significant computational resource waste when tumor regions represent a minimal proportion (as in the Camelyon16 dataset) but may also lead the model to suboptimal solutions. To address these limitations, this paper proposes an end-to-end multi-scale WSI classification framework that integrates multi-scale feature extraction with multiple instance learning. Specifically, our approach includes: (1) a semantic feature filtering module to reduce interference from non-lesion areas; (2) a multi-scale feature extraction module to capture pathological information at different levels; and (3) a multi-scale fusion MIL module for global modeling and feature integration. Through an end-to-end training strategy, we simultaneously optimize both the feature extractor and MIL network, ensuring maximum compatibility between them. Experiments were conducted on three cross-center datasets (DigestPath2019, BCNB, and UBC-OCEAN). Results demonstrate that our proposed method outperforms existing state-of-the-art approaches in terms of both accuracy (ACC) and AUC metrics.
翻译:基于包的多示例学习(MIL)方法已成为全切片图像(WSI)分类的主流方法。然而,现有方法大多采用分段式训练策略,即首先使用预训练的特征提取器提取特征,然后通过MIL聚合这些特征。这种分段式训练方法导致特征提取网络与MIL网络之间协同优化不足,阻碍了端到端的联合优化,从而限制了模型的整体性能。此外,传统方法通常从所有固定尺寸的图像块中提取特征,忽略了病理学家的多尺度观察特性。这不仅在肿瘤区域占比极小(如Camelyon16数据集)时造成大量计算资源浪费,也可能导致模型陷入次优解。为应对这些局限,本文提出了一种端到端的全切片图像多尺度分类框架,将多尺度特征提取与多示例学习相融合。具体而言,我们的方法包括:(1)语义特征过滤模块,以减少非病变区域的干扰;(2)多尺度特征提取模块,以捕获不同层次的病理信息;(3)多尺度融合MIL模块,用于全局建模与特征整合。通过端到端的训练策略,我们同时优化特征提取器与MIL网络,确保二者之间的最大兼容性。我们在三个跨中心数据集(DigestPath2019、BCNB和UBC-OCEAN)上进行了实验。结果表明,我们提出的方法在准确率(ACC)和AUC指标上均优于现有的先进方法。