Semantic communications (SC) have been expected to be a new paradigm shifting to catalyze the next generation communication, whose main concerns shift from accurate bit transmission to effective semantic information exchange in communications. However, the previous and widely-used metrics for images are not applicable to evaluate the image semantic similarity in SC. Classical metrics to measure the similarity between two images usually rely on the pixel level or the structural level, such as the PSNR and the MS-SSIM. Straightforwardly using some tailored metrics based on deep-learning methods in CV community, such as the LPIPS, is infeasible for SC. To tackle this, inspired by BERTScore in NLP community, we propose a novel metric for evaluating image semantic similarity, named Vision Transformer Score (ViTScore). We prove theoretically that ViTScore has 3 important properties, including symmetry, boundedness, and normalization, which make ViTScore convenient and intuitive for image measurement. To evaluate the performance of ViTScore, we compare ViTScore with 3 typical metrics (PSNR, MS-SSIM, and LPIPS) through 5 classes of experiments. Experimental results demonstrate that ViTScore can better evaluate the image semantic similarity than the other 3 typical metrics, which indicates that ViTScore is an effective performance metric when deployed in SC scenarios.
翻译:语义通信(SC)有望成为推动下一代通信发展的新范式,其核心关注点从精确的比特传输转向通信中有效的语义信息交换。然而,以往广泛使用的图像指标无法适用于SC场景中的图像语义相似度评估。经典图像相似度度量通常依赖于像素级或结构级特征,例如PSNR和MS-SSIM。而直接采用计算机视觉领域基于深度学习的定制指标(如LPIPS)在SC中并不可行。为解决这一问题,受NLP领域BERTScore的启发,我们提出了一种新的图像语义相似度评估指标——视觉变换器评分(ViTScore)。我们从理论上证明了ViTScore具有三个重要性质:对称性、有界性和归一化性,这使得ViTScore在图像测量中更加便捷直观。为评估ViTScore的性能,我们通过五类实验将其与三种典型指标(PSNR、MS-SSIM和LPIPS)进行对比。实验结果表明,ViTScore能比这三种典型指标更好地评估图像语义相似度,这意味着ViTScore在SC场景中是一种有效的性能度量指标。