Similarity metrics have played a significant role in computer vision to capture the underlying semantics of images. In recent years, advanced similarity metrics, such as the Learned Perceptual Image Patch Similarity (LPIPS), have emerged. These metrics leverage deep features extracted from trained neural networks and have demonstrated a remarkable ability to closely align with human perception when evaluating relative image similarity. However, it is now well-known that neural networks are susceptible to adversarial examples, i.e., small perturbations invisible to humans crafted to deliberately mislead the model. Consequently, the LPIPS metric is also sensitive to such adversarial examples. This susceptibility introduces significant security concerns, especially considering the widespread adoption of LPIPS in large-scale applications. In this paper, we propose the Robust Learned Perceptual Image Patch Similarity (R-LPIPS) metric, a new metric that leverages adversarially trained deep features. Through a comprehensive set of experiments, we demonstrate the superiority of R-LPIPS compared to the classical LPIPS metric. The code is available at \url{https://github.com/SaraGhazanfari/R-LPIPS}.
翻译:相似度度量在计算机视觉中扮演着重要角色,用于捕捉图像的潜在语义。近年来,诸如学习感知图像块相似度(LPIPS)等高级相似度度量应运而生。这些度量利用从训练好的神经网络中提取的深度特征,在评估相对图像相似度时展现出与人类感知高度对齐的显著能力。然而,众所周知,神经网络易受对抗样本影响,即那些对人类不可见且旨在故意误导模型的小扰动。因此,LPIPS度量也对这类对抗样本敏感。这种脆弱性引发了重大的安全担忧,尤其是在LPIPS被大规模应用广泛采用的背景下。本文提出鲁棒学习感知图像块相似度(R-LPIPS)度量,这是一种利用对抗训练深度特征的新型度量。通过一系列全面的实验,我们证明了R-LPIPS相较于经典LPIPS度量的优越性。代码可在 \url{https://github.com/SaraGhazanfari/R-LPIPS} 获取。