Mimicking a Pathologist: Dual Attention Model for Scoring of Gigapixel Histology Images

Some major challenges associated with the automated processing of whole slide images (WSIs) includes their sheer size, different magnification levels and high resolution. Utilizing these images directly in AI frameworks is computationally expensive due to memory constraints, while downsampling WSIs incurs information loss and splitting WSIs into tiles and patches results in loss of important contextual information. We propose a novel dual attention approach, consisting of two main components, to mimic visual examination by a pathologist. The first component is a soft attention model which takes as input a high-level view of the WSI to determine various regions of interest. We employ a custom sampling method to extract diverse and spatially distinct image tiles from selected high attention areas. The second component is a hard attention classification model, which further extracts a sequence of multi-resolution glimpses from each tile for classification. Since hard attention is non-differentiable, we train this component using reinforcement learning and predict the location of glimpses without processing all patches of a given tile, thereby aligning with pathologist's way of diagnosis. We train our components both separately and in an end-to-end fashion using a joint loss function to demonstrate the efficacy of our proposed model. We employ our proposed model on two different IHC use cases: HER2 prediction on breast cancer and prediction of Intact/Loss status of two MMR biomarkers, for colorectal cancer. We show that the proposed model achieves accuracy comparable to state-of-the-art methods while only processing a small fraction of the WSI at highest magnification.

翻译：全切片图像（WSI）自动处理面临的一些主要挑战包括其巨大尺寸、不同放大倍数和高分辨率。由于内存限制，直接在人工智能框架中使用这些图像计算成本高昂，而对WSI进行下采样会导致信息丢失，将WSI分割为图块和小块会损失重要的上下文信息。我们提出了一种新颖的双重注意力方法，包含两个主要组件，以模仿病理学家的视觉检查。第一个组件是一个软注意力模型，它接收WSI的高层视角作为输入，以确定感兴趣的不同区域。我们采用一种自定义采样方法，从选定的高注意力区域中提取多样化且空间上不同的图像块。第二个组件是一个硬注意力分类模型，它进一步从每个块中提取一系列多分辨率片段进行分类。由于硬注意力不可微，我们使用强化学习训练该组件，并预测片段的定位，而无需处理给定块中的所有小块，从而与病理学家的诊断方式保持一致。我们分别以端到端的方式使用联合损失函数训练我们的组件，以证明所提出模型的有效性。我们将所提出的模型应用于两个不同的免疫组化（IHC）用例：乳腺癌的HER2预测以及结直肠癌两种MMR生物标志物的Intact/Loss状态预测。结果表明，所提出的模型在仅处理最高放大倍数下WSI的一小部分时，实现了与最先进方法相当的准确性。