The concept of image similarity is ambiguous, meaning that images that are considered similar in one context might not be in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of the successful deep perceptual similarity (DPS) metrics to adapt to a given context. Recently, DPS metrics have emerged using the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific contexts of similarity. No single metric can suit all definitions of similarity and previous metrics have been rule-based which are labor intensive to rewrite for new contexts. DPS metrics, on the other hand, use neural networks which might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training positive scalars for the deep features of pretrained CNNs to correctly measure similarity for different contexts. Evaluation is performed on contexts defined by randomly ordering six image distortions (e.g. rotation) by which should be considered more similar when applied to an image. This also gives insight into whether the features in the CNN is enough to discern different distortions without retraining. Finally, the trained metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to an ordering affects their performance on established scenarios. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity.
翻译:图像相似性的概念具有歧义性,即在一种情境下被认为相似的图像在另一种情境下可能不相似。这种歧义性催生了针对特定情境的度量标准。本研究探索了成功的深度感知相似性(DPS)度量方法适应给定情境的能力。近年来,DPS度量方法利用神经网络的深度特征来比较图像,已在有限场景中基于平均人类感知的数据集上取得显著成功。但问题在于这些方法能否适应特定的相似性情境。没有任何单一度量能适用于所有相似性定义,且先前的度量方法基于规则,当应用于新情境时需要大量人工重写。相比之下,DPS度量方法使用神经网络,可通过针对每种情境重新训练来调整。然而,重新训练网络需要耗费资源,且可能损害在先前任务上的性能。本文通过为预训练CNN的深度特征训练正标量来检验DPS度量方法的适应性,使其能针对不同情境正确衡量相似性。评估基于随机排序六种图像失真(如旋转)定义的情境——当这些失真应用于图像时,需要判断何种失真使图像更相似。这还能揭示CNN的特征是否足以在不重新训练的情况下区分不同失真。最后,在感知相似性数据集上评估训练好的度量方法,以考察适应过程是否影响其在已有场景上的表现。结果表明,DPS度量方法可实现高适应性能。尽管适应后的度量方法与基线方法在相同情境下仍存在困难,但在99%的案例中性能得以提升。最后,实验证明这种适应对先前感知相似性性能的损害不显著。