Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://github.com/shaoshitong/DiffuseExpand.
翻译:数据集扩展能有效缓解医学图像分割中的数据稀缺问题,该问题源于隐私顾虑和标注困难。然而,现有的扩展算法仍面临巨大挑战,因为它们无法保证合成图像与配对分割掩码的多样性。近年来,扩散概率模型(DPMs)展现出比生成对抗网络更强大的图像合成性能。基于这一认识,我们提出了一种名为DiffuseExpand的方法,利用DPM扩展二维医学图像分割数据集:首先从高斯噪声中采样多样化的掩码以确保多样性,然后合成图像以确保图像与掩码的对齐。此后,DiffuseExpand选择高质量样本以进一步增强数据扩展的有效性。我们在COVID-19和CGMH Pelvis数据集上的对比与消融实验验证了DiffuseExpand的有效性。我们的代码已发布在https://github.com/shaoshitong/DiffuseExpand。