Dataset expansion can effectively alleviate the problem of data scarcity for medical image segmentation, due to privacy concerns and labeling difficulties. However, existing expansion algorithms still face great challenges due to their inability of guaranteeing the diversity of synthesized images with paired segmentation masks. In recent years, Diffusion Probabilistic Models (DPMs) have shown powerful image synthesis performance, even better than Generative Adversarial Networks. Based on this insight, we propose an approach called DiffuseExpand for expanding datasets for 2D medical image segmentation using DPM, which first samples a variety of masks from Gaussian noise to ensure the diversity, and then synthesizes images to ensure the alignment of images and masks. After that, DiffuseExpand chooses high-quality samples to further enhance the effectiveness of data expansion. Our comparison and ablation experiments on COVID-19 and CGMH Pelvis datasets demonstrate the effectiveness of DiffuseExpand. Our code is released at https://anonymous.4open.science/r/DiffuseExpand.
翻译:数据集扩展能有效缓解医疗图像分割中的数据稀缺问题,而这一问题源于隐私保护和标注困难。然而,现有扩展算法仍面临巨大挑战,因其无法保证合成图像与配对分割掩膜具备多样性。近年来,扩散概率模型(DPM)展现出强大的图像合成性能,甚至优于生成对抗网络。基于这一洞察,我们提出名为DiffuseExpand的方法,利用DPM扩展二维医学图像分割数据集:首先从高斯噪声中采样多样化掩膜以确保多样性,再合成图像以保证图像与掩膜的对齐;随后,DiffuseExpand筛选高质量样本以进一步提升数据扩展的有效性。我们在COVID-19和CGMH Pelvis数据集上的对比与消融实验验证了DiffuseExpand的有效性。代码已发布于https://anonymous.4open.science/r/DiffuseExpand。