Metrics to Quantify Global Consistency in Synthetic Medical Images

Image synthesis is increasingly being adopted in medical image processing, for example for data augmentation or inter-modality image translation. In these critical applications, the generated images must fulfill a high standard of biological correctness. A particular requirement for these images is global consistency, i.e an image being overall coherent and structured so that all parts of the image fit together in a realistic and meaningful way. Yet, established image quality metrics do not explicitly quantify this property of synthetic images. In this work, we introduce two metrics that can measure the global consistency of synthetic images on a per-image basis. To measure the global consistency, we presume that a realistic image exhibits consistent properties, e.g., a person's body fat in a whole-body MRI, throughout the depicted object or scene. Hence, we quantify global consistency by predicting and comparing explicit attributes of images on patches using supervised trained neural networks. Next, we adapt this strategy to an unlabeled setting by measuring the similarity of implicit image features predicted by a self-supervised trained network. Our results demonstrate that predicting explicit attributes of synthetic images on patches can distinguish globally consistent from inconsistent images. Implicit representations of images are less sensitive to assess global consistency but are still serviceable when labeled data is unavailable. Compared to established metrics, such as the FID, our method can explicitly measure global consistency on a per-image basis, enabling a dedicated analysis of the biological plausibility of single synthetic images.

翻译：图像合成正越来越多地应用于医学图像处理领域，例如数据增强或跨模态图像转换。在这些关键应用中，生成的图像必须满足高标准的生物学正确性。这类图像的一个特殊要求是全局一致性，即图像整体上连贯且结构合理，所有部分以真实且有意义的方式相互契合。然而，现有的图像质量指标并未明确量化合成图像的这一特性。本研究提出了两种能够在单幅图像基础上度量合成图像全局一致性的指标。为衡量全局一致性，我们假设真实图像在整个描绘对象或场景中（例如全身MRI中的人体脂肪）会呈现出一致的属性特征。因此，我们通过使用监督训练的神经网络预测并比较图像块上的显式属性来量化全局一致性。接着，我们通过测量自监督训练网络预测的隐式图像特征的相似性，将该策略适配至无标注场景。实验结果表明，预测合成图像块上的显式属性能够区分全局一致与不一致的图像。图像的隐式表征对评估全局一致性的敏感性较低，但在缺乏标注数据时仍具有实用价值。与FID等现有指标相比，我们的方法能够在单幅图像层面显式测量全局一致性，从而实现对单个合成图像生物学合理性的针对性分析。