Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling 3D diffusion models to high resolutions and can be trained on a single \SI{40}{\giga\byte} GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ demonstrate state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to recent GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$, outperforming all comparing methods.
翻译:由于CT或MR扫描的三维特性,医学图像的生成建模是一项极具挑战性的任务。现有方法大多采用分块、分层或级联生成技术,以将高维数据适配到有限的GPU内存中。然而,这些方法可能引入伪影,并可能限制模型在某些下游任务中的适用性。本文提出了WDM,一种基于小波的医学图像合成框架,该框架将扩散模型应用于小波分解后的图像。所提出的方法是一种简单而有效的将三维扩散模型扩展到高分辨率的方式,并且可以在单个40GB GPU上进行训练。在BraTS和LIDC-IDRI数据集上进行的无条件图像生成实验,分辨率为$128 \times 128 \times 128$,结果表明,与最近的GAN、扩散模型和潜在扩散模型相比,该方法在图像保真度(FID)和样本多样性(MS-SSIM)方面均取得了最先进的分数。我们提出的方法是唯一能够在$256 \times 256 \times 256$分辨率下生成高质量图像的方法,其性能优于所有对比方法。