Evaluating the Quality and Diversity of DCGAN-based Generatively Synthesized Diabetic Retinopathy Imagery

from arxiv, 29 Pages, 8 Figures, submitted to MEDAL23: Advances in Deep Generative Models for Medical Artificial Intelligence (Springer Nature series)

Publicly available diabetic retinopathy (DR) datasets are imbalanced, containing limited numbers of images with DR. This imbalance contributes to overfitting when training machine learning classifiers. The impact of this imbalance is exacerbated as the severity of the DR stage increases, affecting the classifiers' diagnostic capacity. The imbalance can be addressed using Generative Adversarial Networks (GANs) to augment the datasets with synthetic images. Generating synthetic images is advantageous if high-quality and diversified images are produced. To evaluate the quality and diversity of synthetic images, several evaluation metrics, such as Multi-Scale Structural Similarity Index (MS-SSIM), Cosine Distance (CD), and Fr\'echet Inception Distance (FID) are used. Understanding the effectiveness of each metric in evaluating the quality and diversity of GAN-based synthetic images is critical to select images for augmentation. To date, there has been limited analysis of the appropriateness of these metrics in the context of biomedical imagery. This work contributes an empirical assessment of these evaluation metrics as applied to synthetic Proliferative DR imagery generated by a Deep Convolutional GAN (DCGAN). Furthermore, the metrics' capacity to indicate the quality and diversity of synthetic images and a correlation with classifier performance is undertaken. This enables a quantitative selection of synthetic imagery and an informed augmentation strategy. Results indicate that FID is suitable for evaluating the quality, while MS-SSIM and CD are suitable for evaluating the diversity of synthetic imagery. Furthermore, the superior performance of Convolutional Neural Network (CNN) and EfficientNet classifiers, as indicated by the F1 and AUC scores, for the augmented datasets demonstrates the efficacy of synthetic imagery to augment the imbalanced dataset.

翻译：公开可用的糖尿病视网膜病变（DR）数据集存在类别不平衡问题，包含的DR图像数量有限。这种不平衡性在训练机器学习分类器时会导致过拟合现象。随着DR阶段严重程度的增加，这种不平衡性的影响会进一步加剧，从而影响分类器的诊断能力。通过使用生成对抗网络（GAN）合成图像来扩充数据集，可以解决这一不平衡问题。生成合成图像的优势取决于能否产生高质量且多样化的图像。为评估合成图像的质量与多样性，通常采用多种评价指标，如多尺度结构相似性指数（MS-SSIM）、余弦距离（CD）和弗雷歇初始距离（FID）。理解各指标在评估基于GAN的合成图像质量与多样性方面的有效性，对于选择用于数据增强的图像至关重要。迄今为止，针对这些指标在生物医学图像领域的适用性分析尚显不足。本文对基于深度卷积生成对抗网络（DCGAN）生成的增殖性DR合成图像所应用的这些评估指标进行了实证评估。此外，本研究进一步探究了这些指标反映合成图像质量与多样性的能力，及其与分类器性能的相关性。这使研究者能够定量选择合成图像并制定合理的数据增强策略。结果表明：FID适用于评估合成图像质量，而MS-SSIM和CD则适用于评估合成图像多样性。此外，基于F1分数和AUC分数的指标显示，在扩充数据集上，卷积神经网络（CNN）和EfficientNet分类器性能显著提升，证明了合成图像在缓解数据集不平衡问题中的有效性。