Unpaired medical image synthesis aims to provide complementary information for an accurate clinical diagnostics, and address challenges in obtaining aligned multi-modal medical scans. Transformer-based models excel in imaging translation tasks thanks to their ability to capture long-range dependencies. Although effective in supervised training settings, their performance falters in unpaired image synthesis, particularly in synthesizing structural details. This paper empirically demonstrates that, lacking strong inductive biases, Transformer can converge to non-optimal solutions in the absence of paired data. To address this, we introduce UNet Structured Transformer (UNest), a novel architecture incorporating structural inductive biases for unpaired medical image synthesis. We leverage the foundational Segment-Anything Model to precisely extract the foreground structure and perform structural attention within the main anatomy. This guides the model to learn key anatomical regions, thus improving structural synthesis under the lack of supervision in unpaired training. Evaluated on two public datasets, spanning three modalities, i.e., MR, CT, and PET, UNest improves recent methods by up to 19.30% across six medical image synthesis tasks. Our code is released at https://github.com/HieuPhan33/MICCAI2024-UNest.
翻译:无配对医学图像合成旨在为精准临床诊断提供互补信息,并解决获取对齐多模态医学扫描的挑战。基于Transformer的模型因其捕捉长程依赖关系的能力,在图像转换任务中表现出色。尽管在有监督训练设置中效果显著,但其在无配对图像合成中的性能,尤其是在合成结构细节方面,表现欠佳。本文通过实证表明,由于缺乏强归纳偏置,Transformer在无配对数据情况下可能收敛到非最优解。为解决此问题,我们引入了UNet结构化Transformer(UNest),这是一种新颖的架构,专为无配对医学图像合成融入了结构归纳偏置。我们利用基础的分割一切模型(Segment-Anything Model)精确提取前景结构,并在主要解剖结构内部执行结构注意力机制。这引导模型学习关键解剖区域,从而在无配对训练缺乏监督的情况下改善结构合成。在两个公共数据集上,涵盖三种模态(即磁共振成像、计算机断层扫描和正电子发射断层扫描)进行评估,UNest在六项医学图像合成任务中将最新方法的性能提升了高达19.30%。我们的代码发布于https://github.com/HieuPhan33/MICCAI2024-UNest。