Whole-slide image analysis via the means of computational pathology often relies on processing tessellated gigapixel images with only slide-level labels available. Applying multiple instance learning-based methods or transformer models is computationally expensive as, for each image, all instances have to be processed simultaneously. The MLP-Mixer is an under-explored alternative model to common vision transformers, especially for large-scale datasets. Due to the lack of a self-attention mechanism, they have linear computational complexity to the number of input patches but achieve comparable performance on natural image datasets. We propose a combination of feature embedding and clustering to preprocess the full whole-slide image into a reduced prototype representation which can then serve as input to a suitable MLP-Mixer architecture. Our experiments on two public benchmarks and one inhouse malignant lymphoma dataset show comparable performance to current state-of-the-art methods, while achieving lower training costs in terms of computational time and memory load. Code is publicly available at https://github.com/butkej/ProtoMixer.
翻译:全切片图像分析通过计算病理学手段常依赖于处理以仅有切片级标签可用的棋盘格化千兆像素图像。应用基于多实例学习的方法或Transformer模型在计算上代价高昂,因为每幅图像的所有实例必须同时处理。MLP-Mixer是一种尚未充分探索的替代模型,相较于常见的视觉Transformer,尤其适用于大规模数据集。由于缺乏自注意力机制,其对输入补丁数量具有线性计算复杂度,但在自然图像数据集上取得了可比性能。我们提出一种结合特征嵌入与聚类的方法,将完整全切片图像预处理为简化的原型表征,进而作为合适MLP-Mixer架构的输入。在两个公开基准数据集及一个内部恶性淋巴瘤数据集上的实验表明,该方法在计算时间和内存负载方面实现更低训练成本的同时,性能与当前最先进方法相当。代码公开于https://github.com/butkej/ProtoMixer。