Survival prediction based on whole slide images (WSIs) is a challenging task for patient-level multiple instance learning (MIL). Due to the vast amount of data for a patient (one or multiple gigapixels WSIs) and the irregularly shaped property of WSI, it is difficult to fully explore spatial, contextual, and hierarchical interaction in the patient-level bag. Many studies adopt random sampling pre-processing strategy and WSI-level aggregation models, which inevitably lose critical prognostic information in the patient-level bag. In this work, we propose a hierarchical vision Transformer framework named HVTSurv, which can encode the local-level relative spatial information, strengthen WSI-level context-aware communication, and establish patient-level hierarchical interaction. Firstly, we design a feature pre-processing strategy, including feature rearrangement and random window masking. Then, we devise three layers to progressively obtain patient-level representation, including a local-level interaction layer adopting Manhattan distance, a WSI-level interaction layer employing spatial shuffle, and a patient-level interaction layer using attention pooling. Moreover, the design of hierarchical network helps the model become more computationally efficient. Finally, we validate HVTSurv with 3,104 patients and 3,752 WSIs across 6 cancer types from The Cancer Genome Atlas (TCGA). The average C-Index is 2.50-11.30% higher than all the prior weakly supervised methods over 6 TCGA datasets. Ablation study and attention visualization further verify the superiority of the proposed HVTSurv. Implementation is available at: https://github.com/szc19990412/HVTSurv.
翻译:基于全切片图像(WSI)的生存预测是患者级多实例学习(MIL)中的一项挑战性任务。由于单个患者数据量庞大(一张或多张十亿像素级WSI)且WSI形状不规则,难以充分探索患者级包中的空间、上下文及分层交互关系。许多研究采用随机采样预处理策略和WSI级聚合模型,这不可避免地丢失了患者级包中的关键预后信息。本文提出名为HVTSurv的分层视觉Transformer框架,该框架能够编码局部级相对空间信息、强化WSI级上下文感知通信,并建立患者级分层交互。首先,我们设计了一种包含特征重排与随机窗口掩码的特征预处理策略。随后,构建三个层级逐步获取患者级表征:采用曼哈顿距离的局部级交互层、运用空间洗牌的WSI级交互层以及使用注意力池化的患者级交互层。此外,分层网络的设计有助于提升模型计算效率。最终,我们基于癌症基因组图谱(TCGA)中6种癌症类型的3,104例患者与3,752张WSI验证了HVTSurv。在6个TCGA数据集上,其平均C-Index较所有先前的弱监督方法高出2.50%-11.30%。消融实验与注意力可视化进一步证实了所提HVTSurv的优越性。代码实现详见:https://github.com/szc19990412/HVTSurv。