The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, enhancement, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. A large number of experiments studied in this review illustrate that the transformer-based method outperforms existing methods through comparisons with multiple evaluation metrics. Finally, we discuss the open challenges and future opportunities in this field. This task-modality review with the latest contents, detailed information, and comprehensive comparison may greatly benefit the broad MIA community.
翻译:Transformer主要用于自然语言处理领域。近年来,它已被引入计算机视觉领域并展现出巨大潜力。作为计算机视觉的重要分支,医学图像分析也极大地受益于这一前沿技术。本综述首先回顾了Transformer的核心组件——注意力机制及其详细结构。随后,我们阐述了Transformer在医学图像分析领域的最新进展。我们按不同任务对应用进行了组织,包括分类、分割、描述、配准、检测、增强、定位和合成。其中主流的分类和分割任务进一步细分为十一种医学图像模态。本综述中研究的大量实验表明,通过多种评价指标的比较,基于Transformer的方法优于现有方法。最后,我们讨论了该领域的开放挑战与未来机遇。这项涵盖最新内容、详细信息及全面比较的任务-模态综述,有望为广泛的医学图像分析研究社区提供重要参考。