DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis

The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these problems, this paper proposes to efficiently exploit WSI pyramids from a new perspective, the dual-stream network with cross-attention (DSCA). Our key idea is to utilize two sub-streams to process the WSI patches with two resolutions, where a square pooling is devised in a high-resolution stream to significantly reduce computational costs, and a cross-attention-based method is proposed to properly handle the fusion of dual-stream features. We validate our DSCA on three publicly-available datasets with a total number of 3,101 WSIs from 1,911 patients. Our experiments and ablation studies verify that (i) the proposed DSCA could outperform existing state-of-the-art methods in cancer prognosis, by an average C-Index improvement of around 4.6%; (ii) our DSCA network is more efficient in computation -- it has more learnable parameters (6.31M vs. 860.18K) but less computational costs (2.51G vs. 4.94G), compared to a typical existing multi-resolution network. (iii) the key components of DSCA, dual-stream and cross-attention, indeed contribute to our model's performance, gaining an average C-Index rise of around 2.0% while maintaining a relatively-small computational load. Our DSCA could serve as an alternative and effective tool for WSI-based cancer prognosis.

翻译：在千兆像素全切片图像（WSI）上进行癌症预后始终是一项具有挑战性的任务。为进一步增强WSI的视觉表征，现有方法已探索利用WSI中的图像金字塔而非单一分辨率图像。尽管如此，这些方法仍面临两大问题：高计算成本以及多分辨率特征融合中未被注意的语义鸿沟。为解决这些问题，本文提出从新视角高效利用WSI金字塔——双流交叉注意力网络（DSCA）。我们的核心思想是利用两个子流处理两种分辨率的WSI补丁，其中在高分辨率流中设计了一种方形池化方法以显著降低计算成本，并提出了基于交叉注意力的方法以妥善处理双流特征的融合。我们在三个公开数据集上验证了DSCA，这些数据集共包含来自1,911名患者的3,101张WSI。我们的实验与消融研究证实：（i）所提出的DSCA在癌症预后中平均C-Index提升约4.6%，优于现有最优方法；（ii）与典型现有多分辨率网络相比，DSCA网络在计算效率上更优——虽然可学习参数更多（6.31M vs. 860.18K），但计算成本更低（2.51G vs. 4.94G）；（iii）DSCA的关键组件——双流与交叉注意力——确实对模型性能有所贡献，在保持相对较小计算负载的同时平均C-Index提升了约2.0%。我们的DSCA可作为基于WSI的癌症预后的一种有效替代工具。