MedTransformer: Accurate AD Diagnosis for 3D MRI Images through 2D Vision Transformers

Automated diagnosis of AD in brain images is becoming a clinically important technique to support precision and efficient diagnosis and treatment planning. A few efforts have been made to automatically diagnose AD in magnetic resonance imaging (MRI) using three-dimensional CNNs. However, due to the complexity of 3D models, the performance is still unsatisfactory, both in terms of accuracy and efficiency. To overcome the complexities of 3D images and 3D models, in this study, we aim to attack this problem with 2D vision Transformers. We propose a 2D transformer-based medical image model with various transformer attention encoders to diagnose AD in 3D MRI images, by cutting the 3D images into multiple 2D slices.The model consists of four main components: shared encoders across three dimensions, dimension-specific encoders, attention across images from the same dimension, and attention across three dimensions. It is used to obtain attention relationships among multiple sequences from different dimensions (axial, coronal, and sagittal) and multiple slices. We also propose morphology augmentation, an erosion and dilation based method to increase the structural difference between AD and normal images. In this experiment, we use multiple datasets from ADNI, AIBL, MIRAID, OASIS to show the performance of our model. Our proposed MedTransformer demonstrates a strong ability in diagnosing AD. These results demonstrate the effectiveness of MedTransformer in learning from 3D data using a much smaller model and its capability to generalize among different medical tasks, which provides a possibility to help doctors diagnose AD in a simpler way.

翻译：脑部图像中阿尔茨海默病的自动诊断正成为支持精准、高效诊断与治疗规划的关键临床技术。现有研究尝试利用三维卷积神经网络对磁共振成像进行AD自动诊断，但由于三维模型的复杂性，其准确性与效率均未达到理想效果。为克服3D图像及3D模型的复杂性，本研究致力采用2D视觉Transformer解决该问题。我们提出一种基于Transformer的2D医学图像模型，通过将3D图像切分为多张2D切片，并集成多种Transformer注意力编码器，实现对3D MRI图像中AD的诊断。该模型包含四个核心组件：三维共享编码器、维度特异性编码器、同维度图像间注意力模块及三维跨维度注意力模块，用于捕获不同维度（轴位、冠状位、矢状位）及多切片序列间的注意力关系。同时提出形态学增强方法（基于腐蚀与膨胀操作），以增强AD图像与正常图像间的结构差异。实验中，我们采用来自ADNI、AIBL、MIRAID、OASIS等多个数据集验证模型性能。结果表明，所提出的MedTransformer在AD诊断中展现出强大能力，验证了其通过更小模型学习3D数据的有效性，以及在多种医学任务中的泛化能力，为临床医生简化AD诊断流程提供了可能。