Chest X-ray (CXR) images are among the most commonly used diagnostic imaging modalities in clinical practice. Stringent privacy constraints often limit the public dissemination of patient CXR images, contributing to the increasing use of synthetic images produced by deep generative models for data sharing and training machine learning models. Given the high-stakes downstream applications of CXR images, it is crucial to evaluate how faithfully synthetic images reflect the underlying target distribution. We propose the embedded characteristic score (ECS), a flexible evaluation procedure that compares synthetic and patient CXR samples through characteristic function transforms of feature embeddings. The choice of embedding can be tailored to the clinical or scientific context of interest. By leveraging the behavior of characteristic functions near the origin, ECS is sensitive to differences in higher moments and distribution tails, aspects that are often overlooked by commonly used evaluation metrics such as the Fréchet Inception Distance (FID). We establish theoretical properties of ECS and describe a calibration strategy based on a simple resampling procedure. We compare the empirical performance of ECS against FID via simulations and standard benchmark imaging datasets. Assessing synthetic CXR images with ECS uncovers clinically relevant distributional discrepancies relative to patient CXR images. These results highlight the importance of reliable evaluation of synthetic data that inform high-stakes decisions.
翻译:胸部X光(CXR)图像是临床实践中应用最广泛的诊断成像模态之一。严格的隐私限制往往阻碍患者CXR图像的公开传播,这促使深度生成模型产生的合成图像在数据共享和机器学习模型训练中的应用日益增加。鉴于CXR图像在高风险下游应用中的重要性,评估合成图像如何忠实地反映底层目标分布至关重要。我们提出嵌入特征评分(ECS),这是一种灵活的评价流程,通过特征嵌入的特征函数变换比较合成样本与患者CXR样本。嵌入的选择可根据临床或科学背景进行定制。通过利用特征函数在原点附近的特性,ECS对高阶矩和分布尾部的差异敏感,而这些方面常被常用评价指标(如Fréchet初始距离FID)所忽略。我们建立了ECS的理论性质,并描述了基于简单重采样过程的校准策略。通过模拟实验和标准基准成像数据集,我们比较了ECS与FID的实证性能。使用ECS评估合成CXR图像揭示了与患者CXR图像相关的临床分布差异。这些结果凸显了在高风险决策中可靠评估合成数据的重要性。