Due to the three-dimensional nature of CT- or MR-scans, generative modeling of medical images is a particularly challenging task. Existing approaches mostly apply patch-wise, slice-wise, or cascaded generation techniques to fit the high-dimensional data into the limited GPU memory. However, these approaches may introduce artifacts and potentially restrict the model's applicability for certain downstream tasks. This work presents WDM, a wavelet-based medical image synthesis framework that applies a diffusion model on wavelet decomposed images. The presented approach is a simple yet effective way of scaling diffusion models to high resolutions and can be trained on a single 40 GB GPU. Experimental results on BraTS and LIDC-IDRI unconditional image generation at a resolution of $128 \times 128 \times 128$ show state-of-the-art image fidelity (FID) and sample diversity (MS-SSIM) scores compared to GANs, Diffusion Models, and Latent Diffusion Models. Our proposed method is the only one capable of generating high-quality images at a resolution of $256 \times 256 \times 256$.
翻译:由于CT或MR扫描的三维特性,医学图像的生成建模是一项极具挑战性的任务。现有方法主要采用分块、分层或级联生成技术,将高维数据适配到有限的GPU内存中。然而,这些方法可能会引入伪影,并可能限制模型在特定下游任务中的适用性。本工作提出WDM,一种基于小波的医学图像合成框架,该框架在小波分解图像上应用扩散模型。所提出的方法是一种简单而有效的缩放扩散模型以处理高分辨率的方法,并且可以在单个40 GB GPU上进行训练。在BraTS和LIDC-IDRI数据集上进行的$128 \times 128 \times 128$分辨率无条件图像生成实验结果显示,与GAN、扩散模型和潜在扩散模型相比,该方法在图像保真度(FID)和样本多样性(MS-SSIM)指标上达到了最先进的水平。我们提出的方法是唯一能够在$256 \times 256 \times 256$分辨率下生成高质量图像的方法。