Due to its superior efficiency in utilizing annotations and addressing gigapixel-sized images, multiple instance learning (MIL) has shown great promise as a framework for whole slide image (WSI) classification in digital pathology diagnosis. However, existing methods tend to focus on advanced aggregators with different structures, often overlooking the intrinsic features of H\&E pathological slides. To address this limitation, we introduced two pathological priors: nuclear heterogeneity of diseased cells and spatial correlation of pathological tiles. Leveraging the former, we proposed a data augmentation method that utilizes stain separation during extractor training via a contrastive learning strategy to obtain instance-level representations. We then described the spatial relationships between the tiles using an adjacency matrix. By integrating these two views, we designed a multi-instance framework for analyzing H\&E-stained tissue images based on pathological inductive bias, encompassing feature extraction, filtering, and aggregation. Extensive experiments on the Camelyon16 breast dataset and TCGA-NSCLC Lung dataset demonstrate that our proposed framework can effectively handle tasks related to cancer detection and differentiation of subtypes, outperforming state-of-the-art medical image classification methods based on MIL. The code will be released later.
翻译:由于其在利用标注和处理十亿像素级图像方面的高效性,多实例学习在数字病理诊断中作为全切片图像分类框架展现出巨大潜力。然而,现有方法往往侧重于不同结构的高级聚合器,却常忽视H&E病理切片的固有特征。针对这一局限性,我们引入了两种病理先验知识:病变细胞的核异质性和病理切片的空间相关性。基于前者,我们提出了一种数据增强方法,在特征提取器训练过程中利用染色分离技术,通过对比学习策略获取实例级表征。随后,我们采用邻接矩阵描述切片之间的空间关系。通过整合这两种视角,我们基于病理归纳偏差设计了一个用于分析H&E染色组织图像的多实例框架,涵盖特征提取、过滤和聚合。在Camelyon16乳腺数据集和TCGA-NSCLC肺癌数据集上的大量实验表明,我们提出的框架能够有效处理癌症检测与亚型区分相关任务,其性能优于基于MIL的最先进医学图像分类方法。代码将在后续发布。