The transformer is primarily used in the field of natural language processing. Recently, it has been adopted and shows promise in the computer vision (CV) field. Medical image analysis (MIA), as a critical branch of CV, also greatly benefits from this state-of-the-art technique. In this review, we first recap the core component of the transformer, the attention mechanism, and the detailed structures of the transformer. After that, we depict the recent progress of the transformer in the field of MIA. We organize the applications in a sequence of different tasks, including classification, segmentation, captioning, registration, detection, reconstruction, denoising, localization, and synthesis. The mainstream classification and segmentation tasks are further divided into eleven medical image modalities. Finally, We discuss the open challenges and future opportunities in this field. This review with the latest contents, detailed information, and task-modality organization mode may greatly benefit the broad MIA community.
翻译:Transformer主要应用于自然语言处理领域。近年来,该技术被引入计算机视觉领域并展现出巨大潜力。作为计算机视觉的重要分支,医学图像分析也显著受益于这一前沿技术。本综述首先回顾了Transformer的核心组件——注意力机制,以及Transformer的详细架构。随后,我们阐述了Transformer在医学图像分析领域的最新进展,按照不同任务类型对应用进行系统梳理,包括分类、分割、描述生成、配准、检测、重建、去噪、定位和合成。其中主流的分类与分割任务进一步细分为十一种医学图像模态。最后,我们探讨了该领域面临的开放挑战与未来机遇。本综述凭借最新内容、详细信息及任务-模态的组织形式,有望为广大的医学图像分析研究群体提供重要参考。