Whole Slide Images (WSIs) present a challenging computer vision task due to their gigapixel size and presence of numerous artefacts. Yet they are a valuable resource for patient diagnosis and stratification, often representing the gold standard for diagnostic tasks. Real-world clinical datasets tend to come as sets of heterogeneous WSIs with labels present at the patient-level, with poor to no annotations. Weakly supervised attention-based multiple instance learning approaches have been developed in recent years to address these challenges, but can fail to resolve both long and short-range dependencies. Here we propose an end-to-end multi-stain self-attention graph (MUSTANG) multiple instance learning pipeline, which is designed to solve a weakly-supervised gigapixel multi-image classification task, where the label is assigned at the patient-level, but no slide-level labels or region annotations are available. The pipeline uses a self-attention based approach by restricting the operations to a highly sparse k-Nearest Neighbour Graph of embedded WSI patches based on the Euclidean distance. We show this approach achieves a state-of-the-art F1-score/AUC of 0.89/0.92, outperforming the widely used CLAM model. Our approach is highly modular and can easily be modified to suit different clinical datasets, as it only requires a patient-level label without annotations and accepts WSI sets of different sizes, as the graphs can be of varying sizes and structures. The source code can be found at https://github.com/AmayaGS/MUSTANG.
翻译:全切片图像(WSI)因具有千兆像素级尺寸及大量伪影而构成一项具有挑战性的计算机视觉任务。然而,这些图像作为患者诊断和分层的重要资源,通常被视为诊断任务的**金标准**。真实临床数据集往往以异质性WSI集合的形式呈现,标签仅存在于患者级别,且几乎或完全没有注释标注。近年来,基于弱监督注意力的多实例学习方法已被提出以应对这些挑战,但此类方法在处理长程与短程依赖关系时仍可能失效。本文提出一种端到端的多染色自注意力图(MUSTANG)多实例学习流水线,旨在解决弱监督下的千兆像素级多图像分类任务:标签在患者级别分配,但不存在切片级别标签或区域注释。该流水线采用基于自注意力的方法,通过将运算限制在基于欧氏距离构建的稀疏k近邻图上(该图由嵌入的WSI图像块构成)。实验表明,本方法实现了0.89/0.92的先进F1分数/AUC指标,显著优于广泛使用的CLAM模型。该框架具备高度模块化特性,能够轻松适配不同临床数据集——仅需患者级标签(无需注释),且可接受不同规模的WSI集合(因图结构与尺寸可动态变化)。源代码详见https://github.com/AmayaGS/MUSTANG。