Text-to-image generation has been increasingly applied in medical domains for various purposes such as data augmentation and education. Evaluating the quality and clinical reliability of these generated images is essential. However, existing methods mainly assess image realism or diversity, while failing to capture whether the generated images reflect the intended clinical semantics, such as anatomical location and pathology. In this study, we propose the Clinical Semantics Evaluator (CSEval), a framework that leverages language models to assess clinical semantic alignment between the generated images and their conditioning prompts. Our experiments show that CSEval identifies semantic inconsistencies overlooked by other metrics and correlates with expert judgment. CSEval provides a scalable and clinically meaningful complement to existing evaluation methods, supporting the safe adoption of generative models in healthcare.
翻译:文本到图像生成已日益应用于医疗领域,用于数据增强和教育等多种目的。评估这些生成图像的质量和临床可靠性至关重要。然而,现有方法主要评估图像的真实性或多样性,而未能捕捉生成图像是否反映了预期的临床语义,例如解剖位置和病理特征。在本研究中,我们提出了临床语义评估器(CSEval),该框架利用语言模型来评估生成图像与其条件提示之间的临床语义对齐程度。我们的实验表明,CSEval能够识别被其他指标忽略的语义不一致性,并与专家判断具有相关性。CSEval为现有评估方法提供了一个可扩展且具有临床意义的补充,支持生成模型在医疗保健领域的安全应用。