The scarcity of publicly available medical imaging data limits the development of effective AI models. This work proposes a memory-efficient patch-wise denoising diffusion probabilistic model (DDPM) for generating synthetic medical images, focusing on CT scans with lung nodules. Our approach generates high-utility synthetic images with nodule segmentation while efficiently managing memory constraints, enabling the creation of training datasets. We evaluate the method in two scenarios: training a segmentation model exclusively on synthetic data, and augmenting real-world training data with synthetic images. In the first case, models trained solely on synthetic data achieve Dice scores comparable to those trained on real-world data benchmarks. In the second case, augmenting real-world data with synthetic images significantly improves segmentation performance. The generated images demonstrate their potential to enhance medical image datasets in scenarios with limited real-world data.
翻译:公开可用的医学影像数据稀缺限制了有效AI模型的发展。本研究提出了一种内存高效的逐块去噪扩散概率模型(DDPM),用于生成合成医学图像,重点关注含肺结节的CT扫描。我们的方法在高效管理内存约束的同时,生成具有结节分割功能的高实用性合成图像,从而能够创建训练数据集。我们在两种场景下评估该方法:完全使用合成数据训练分割模型,以及用合成图像增强真实世界训练数据。在第一种情况下,仅使用合成数据训练的模型获得的Dice分数与使用真实世界基准数据训练的模型相当。在第二种情况下,用合成图像增强真实世界数据显著提高了分割性能。生成的图像证明了其在真实世界数据有限的情况下增强医学图像数据集的潜力。