Self-supervised learning (SSL) and diffusion models have advanced representation learning and image synthesis. However, in 3D medical imaging, they remain separate: diffusion for synthesis, SSL for analysis. Unifying 3D medical image synthesis and analysis is intuitive yet challenging, as multi-center datasets exhibit dominant style shifts, while downstream tasks rely on anatomy, and site-specific style co-varies with anatomy across slices, making factors unreliable without explicit constraints. In this paper, we propose MeDUET, a 3D Medical image Disentangled UnifiEd PreTraining framework that performs SSL in the Variational Autoencoder (VAE) latent space which explicitly disentangles domain-invariant content from domain-specific style. The token demixing mechanism serves to turn disentanglement from a modeling assumption into an empirically identifiable property. Two novel proxy tasks, Mixed-Factor Token Distillation (MFTD) and Swap-invariance Quadruplet Contrast (SiQC), are devised to synergistically enhance disentanglement. Once pretrained, MeDUET is capable of (i) delivering higher fidelity, faster convergence, and improved controllability for synthesis, and (ii) demonstrating strong domain generalization and notable label efficiency for analysis across diverse medical benchmarks. In summary, MeDUET converts multi-source heterogeneity from an obstacle into a learning signal, enabling unified pretraining for 3D medical image synthesis and analysis. The code is available at https://github.com/JK-Liu7/MeDUET .
翻译:自监督学习与扩散模型已分别推动了表征学习与图像合成领域的进展。然而在三维医学影像中,二者仍处于分离状态:扩散模型用于合成,自监督学习用于分析。将三维医学图像合成与分析进行统一具有直观意义却充满挑战,因为多中心数据集存在显著风格偏移,而下游任务依赖解剖结构,且站点特异性风格会随切片解剖结构共变,导致在没有显式约束时各因素难以可靠分离。本文提出MeDUET——一个三维医学图像可解耦统一预训练框架,该框架在变分自编码器隐空间中进行自监督学习,显式地将域不变内容与域特定风格解耦。令牌解混机制将解耦从建模假设转化为可通过经验识别的特性。我们设计了两个新颖的代理任务——混合因子令牌蒸馏与交换不变四元组对比,以协同增强解耦效果。预训练完成后,MeDUET能够:(i)在合成任务中实现更高保真度、更快收敛速度与更强可控性;(ii)在分析任务中于多样医学基准上展现出优异的领域泛化能力与显著的标签效率。总而言之,MeDUET将多源异质性从障碍转化为学习信号,实现了面向三维医学图像合成与分析的统一预训练。代码已开源:https://github.com/JK-Liu7/MeDUET。