Artificial Intelligence (AI)-based models can help in diagnosing COVID-19 from lung CT scans and X-ray images; however, these models require large amounts of data for training and validation. Many researchers studied Generative Adversarial Networks (GANs) for producing synthetic lung CT scans and X-Ray images to improve the performance of AI-based models. It is not well explored how good GAN-based methods performed to generate reliable synthetic data. This work analyzes 43 published studies that reported GANs for synthetic data generation. Many of these studies suffered data bias, lack of reproducibility, and lack of feedback from the radiologists or other domain experts. A common issue in these studies is the unavailability of the source code, hindering reproducibility. The included studies reported rescaling of the input images to train the existing GANs architecture without providing clinical insights on how the rescaling was motivated. Finally, even though GAN-based methods have the potential for data augmentation and improving the training of AI-based models, these methods fall short in terms of their use in clinical practice. This paper highlights research hotspots in countering the data scarcity problem, identifies various issues as well as potentials, and provides recommendations to guide future research. These recommendations might be useful to improve acceptability for the GAN-based approaches for data augmentation as GANs for data augmentation are increasingly becoming popular in the AI and medical imaging research community.
翻译:基于人工智能(AI)的模型可通过肺部CT扫描和X射线图像辅助诊断COVID-19,但这些模型需要大量数据进行训练和验证。许多研究人员研究了生成对抗网络(GAN)用于生成合成肺部CT扫描和X射线图像,以提升AI模型的性能。目前尚未充分探讨基于GAN的方法在生成可靠合成数据方面的表现。本研究分析了43篇关于使用GAN生成合成数据的已发表论文。其中许多研究存在数据偏差、可重复性不足以及缺乏放射科医生或其他领域专家反馈的问题。这些研究的一个普遍问题是源代码不可用,阻碍了可重复性。纳入的研究报告了通过缩放输入图像来训练现有GAN架构,但未提供临床依据说明缩放动机。最后,尽管基于GAN的方法在数据增强和提升AI模型训练方面具有潜力,但这些方法在临床实践中的应用仍显不足。本文指出了应对数据稀缺问题的研究热点,识别了多种问题与潜力,并提出了指导未来研究的建议。这些建议可能有助于提升基于GAN的数据增强方法的可接受性,因为GAN在AI和医学影像研究领域中正日益流行。