Recently, pathological diagnosis, the gold standard for cancer diagnosis, has achieved superior performance by combining the Transformer with the multiple instance learning (MIL) framework using whole slide images (WSIs). However, the giga-pixel nature of WSIs poses a great challenge for the quadratic-complexity self-attention mechanism in Transformer to be applied in MIL. Existing studies usually use linear attention to improve computing efficiency but inevitably bring performance bottlenecks. To tackle this challenge, we propose a MamMIL framework for WSI classification by cooperating the selective structured state space model (i.e., Mamba) with MIL for the first time, enabling the modeling of instance dependencies while maintaining linear complexity. Specifically, to solve the problem that Mamba can only conduct unidirectional one-dimensional (1D) sequence modeling, we innovatively introduce a bidirectional state space model and a 2D context-aware block to enable MamMIL to learn the bidirectional instance dependencies with 2D spatial relationships. Experiments on two datasets show that MamMIL can achieve advanced classification performance with smaller memory footprints than the state-of-the-art MIL frameworks based on the Transformer. The code will be open-sourced if accepted.
翻译:近年来,作为癌症诊断金标准的病理诊断通过将Transformer与全切片图像(WSIs)的多实例学习(MIL)框架相结合,取得了卓越性能。然而,WSI的千兆像素特性使Transformer中具有二次复杂度的自注意力机制在MIL中的应用面临巨大挑战。现有研究通常采用线性注意力机制提升计算效率,但不可避免地带来了性能瓶颈。为解决该挑战,我们首次提出通过将选择性结构化状态空间模型(即Mamba)与MIL协同的MamMIL框架进行WSI分类,在保持线性复杂度的同时实现实例依赖关系建模。具体而言,针对Mamba仅能进行单向一维(1D)序列建模的问题,我们创新性地引入双向状态空间模型和二维上下文感知模块,使MamMIL能够学习具备二维空间关系的双向实例依赖关系。在两个数据集上的实验表明,与基于Transformer的最先进MIL框架相比,MamMIL能以更小的内存占用实现先进的分类性能。若被接收,代码将开源。