Large annotated datasets are required for training deep learning models, but in medical imaging data sharing is often complicated due to ethics, anonymization and data protection legislation. Generative AI models, such as generative adversarial networks (GANs) and diffusion models, can today produce very realistic synthetic images, and can potentially facilitate data sharing. However, in order to share synthetic medical images it must first be demonstrated that they can be used for training different networks with acceptable performance. Here, we therefore comprehensively evaluate four GANs (progressive GAN, StyleGAN 1-3) and a diffusion model for the task of brain tumor segmentation (using two segmentation networks, U-Net and a Swin transformer). Our results show that segmentation networks trained on synthetic images reach Dice scores that are 80% - 90% of Dice scores when training with real images, but that memorization of the training images can be a problem for diffusion models if the original dataset is too small. Our conclusion is that sharing synthetic medical images is a viable option to sharing real images, but that further work is required. The trained generative models and the generated synthetic images are shared on AIDA data hub
翻译:训练深度学习模型需要大量标注数据集,但由于伦理、匿名化和数据保护法规,医学影像数据共享往往面临诸多限制。生成式人工智能模型(如生成对抗网络和扩散模型)目前能够生成极为逼真的合成图像,并有望促进数据共享。然而,要共享合成医学图像,首先必须证明其可用于训练不同网络且达到可接受的性能表现。为此,本研究系统评估了四种生成对抗网络(渐进式GAN、StyleGAN 1-3)与一种扩散模型在脑肿瘤分割任务中的表现(采用两种分割网络:U-Net和Swin Transformer)。结果显示:基于合成图像训练的分割网络其Dice评分可达真实图像训练结果的80%-90%,但当原始数据集过小时,扩散模型可能存在对训练图像的记忆性问题。我们的结论是:共享合成医学图像是替代真实图像共享的可行方案,但仍有待深入研究。训练好的生成模型及生成的合成图像已共享至AIDA数据平台。