Accurate embryo morphology assessment is essential in assisted reproductive technology for selecting the most viable embryo. Artificial intelligence has the potential to enhance this process. However, the limited availability of embryo data presents challenges for training deep learning models. To address this, we trained two generative models using two datasets-one we created and made publicly available, and one existing public dataset-to generate synthetic embryo images at various cell stages, including 2-cell, 4-cell, 8-cell, morula, and blastocyst. These were combined with real images to train classification models for embryo cell stage prediction. Our results demonstrate that incorporating synthetic images alongside real data improved classification performance, with the model achieving 97% accuracy compared to 94.5% when trained solely on real data. This trend remained consistent when tested on an external Blastocyst dataset from a different clinic. Notably, even when trained exclusively on synthetic data and tested on real data, the model achieved a high accuracy of 92%. Furthermore, combining synthetic data from both generative models yielded better classification results than using data from a single generative model. Four embryologists evaluated the fidelity of the synthetic images through a Turing test, during which they annotated inaccuracies and offered feedback. The analysis showed the diffusion model outperformed the generative adversarial network, deceiving embryologists 66.6% versus 25.3% and achieving lower Frechet inception distance scores.
翻译:在辅助生殖技术中,精确评估胚胎形态对于选择最具活力的胚胎至关重要。人工智能有潜力优化这一过程。然而,胚胎数据的有限性给深度学习模型的训练带来了挑战。为解决此问题,我们使用两个数据集——一个是我们创建并公开的数据集,另一个是现有的公共数据集——训练了两个生成模型,以生成包含2细胞、4细胞、8细胞、桑椹胚和囊胚等多个细胞阶段的合成胚胎图像。这些合成图像与真实图像结合,用于训练胚胎细胞阶段预测的分类模型。我们的结果表明,在真实数据基础上加入合成图像提升了分类性能:模型准确率达到97%,而仅使用真实数据训练时准确率为94.5%。这一趋势在使用来自另一诊所的外部囊胚数据集进行测试时保持一致。值得注意的是,即使仅使用合成数据训练并在真实数据上测试,模型也达到了92%的高准确率。此外,结合两个生成模型的合成数据比使用单一生成模型的数据获得了更好的分类结果。四位胚胎学家通过图灵测试评估了合成图像的真实性,他们在测试中标注了不准确之处并提供了反馈。分析显示,扩散模型的表现优于生成对抗网络,其欺骗胚胎学家的比例为66.6%对25.3%,并获得了更低的弗雷歇起始距离分数。