Image translation has wide applications, such as style transfer and modality conversion, usually aiming to generate images having both high degrees of realism and faithfulness. These problems remain difficult, especially when it is important to preserve semantic structures. Traditional image-level similarity metrics are of limited use, since the semantics of an image are high-level, and not strongly governed by pixel-wise faithfulness to an original image. Towards filling this gap, we introduce SAMScore, a generic semantic structural similarity metric for evaluating the faithfulness of image translation models. SAMScore is based on the recent high-performance Segment Anything Model (SAM), which can perform semantic similarity comparisons with standout accuracy. We applied SAMScore on 19 image translation tasks, and found that it is able to outperform all other competitive metrics on all of the tasks. We envision that SAMScore will prove to be a valuable tool that will help to drive the vibrant field of image translation, by allowing for more precise evaluations of new and evolving translation models. The code is available at https://github.com/Kent0n-Li/SAMScore.
翻译:图像翻译具有广泛的应用,例如风格迁移和模态转换,通常旨在生成兼具高真实感与高保真度的图像。当需要保留语义结构时,这些问题仍然具有挑战性。传统图像级相似性度量作用有限,因为图像语义是高层级的,且不完全受像素级保真度的约束。为填补这一空白,我们提出SAMScore,一种用于评估图像翻译模型保真度的通用语义结构相似性度量。SAMScore基于近期高性能的Segment Anything Model (SAM),该模型能够以突出精度执行语义相似性比较。我们在19项图像翻译任务上应用了SAMScore,发现它能够在所有任务上超越所有其他竞争性度量。我们预计SAMScore将成为一个有价值的工具,通过允许对新出现的及不断演进的翻译模型进行更精确的评估,从而推动图像翻译这一充满活力的领域发展。代码可在https://github.com/Kent0n-Li/SAMScore获取。