Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen. An automated analysis of such images using deep learning models is therefore of high demand. The transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information. Here, the whole-slide image is partitioned into smaller image patches and feature tokens are extracted from these image patches. However, while the conventional transformer allows for a simultaneous processing of a large set of input tokens, the computational demand scales quadratically with the number of input tokens and thus quadratically with the number of image patches. To address this problem we propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches. Our experiments demonstrate that this architecture is at least on-par with and even outperforms other attention-based state-of-the-art methods on two public datasets: On the use-case of lung cancer (TCGA NSCLC) our model reaches a mean area under the receiver operating characteristic (AUC) of 0.970 $\pm$ 0.008 and on renal cancer (TCGA RCC) reaches a mean AUC of 0.985 $\pm$ 0.004. Furthermore, we show that our proposed model is efficient in low-data regimes, making it a promising approach for analyzing whole-slide images in resource-limited settings. To foster research in this direction, we make our code publicly available on GitHub: XXX.
翻译:全切片成像技术能够实现组织学标本高分辨率图像的采集与数字化。因此,利用深度学习模型对此类图像进行自动化分析具有重要需求。Transformer架构被认为是有效利用高分辨率信息的候选方案之一。在该方法中,全切片图像被分割为较小的图像块,并从这些图像块中提取特征令牌。然而,传统Transformer虽然能同时处理大量输入令牌,但其计算复杂度随输入令牌数量呈二次方增长,即与图像块数量呈二次方关系。为解决这一问题,我们提出了一种基于交叉注意力机制的新型级联交叉注意力网络(CCAN),其计算复杂度随提取图像块数量呈线性增长。实验表明,该架构在两个公开数据集上至少与当前最先进的注意力方法性能相当,甚至更优:在肺癌(TCGA NSCLC)应用场景中,模型受试者工作特征曲线下面积(AUC)均值达到0.970±0.008;在肾癌(TCGA RCC)应用中,AUC均值达到0.985±0.004。此外,我们证明所提模型在低数据场景下具有高效性,使其成为资源有限条件下分析全切片图像的可行方案。为促进该方向研究,我们将代码开源在GitHub:XXX。