Deep Perceptual Similarity is Adaptable to Ambiguous Contexts

The concept of image similarity is ambiguous, and images can be similar in one context and not in another. This ambiguity motivates the creation of metrics for specific contexts. This work explores the ability of deep perceptual similarity (DPS) metrics to adapt to a given context. DPS metrics use the deep features of neural networks for comparing images. These metrics have been successful on datasets that leverage the average human perception in limited settings. But the question remains if they could be adapted to specific similarity contexts. No single metric can suit all similarity contexts, and previous rule-based metrics are labor-intensive to rewrite for new contexts. On the other hand, DPS metrics use neural networks that might be retrained for each context. However, retraining networks takes resources and might ruin performance on previous tasks. This work examines the adaptability of DPS metrics by training ImageNet pretrained CNNs to measure similarity according to given contexts. Contexts are created by randomly ranking six image distortions. Distortions later in the ranking are considered more disruptive to similarity when applied to an image for that context. This also gives insight into whether the pretrained features capture different similarity contexts. The adapted metrics are evaluated on a perceptual similarity dataset to evaluate if adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online: https://github.com/LTU-Machine-Learning/Analysis-of-Deep-Perceptual-Loss-Networks

翻译：图像相似性的概念具有歧义性，同一图像在一种上下文中可能相似，而在另一种上下文中则不相似。这种歧义性促使针对特定上下文的度量标准被创建。本研究探讨了深度感知相似性（DPS）度量标准适应给定上下文的能力。DPS度量标准利用神经网络的深层特征来比较图像。这些度量标准在基于有限设置下平均人类感知的数据集上表现成功，但问题在于它们是否能够适应特定的相似性上下文。没有单一度量标准能适配所有相似性上下文，而以往的基于规则的度量标准在适应新上下文时需要耗费大量人力进行重写。另一方面，DPS度量标准使用的神经网络可针对每个上下文进行重新训练，但重新训练网络会消耗资源，并可能损害先前任务上的性能。本研究通过训练基于ImageNet预训练的CNN来根据给定上下文度量相似性，从而检验DPS度量标准的适应性。上下文通过随机排序六种图像失真来创建，在该上下文中，排序靠后的失真被认为对图像相似性的破坏更大。这同时揭示了预训练特征是否能够捕捉不同的相似性上下文。评估适应后度量标准在感知相似性数据集上的表现，以检验上下文适应是否影响其原有性能。研究结果表明，DPS度量标准能够以高性能适应上下文。尽管适应后的度量标准在相同上下文中存在与基线类似的困难，但在99%的情况下性能得到提升。最后，适应过程对原有感知相似性性能未产生显著损害。本工作的实现代码已在线发布：https://github.com/LTU-Machine-Learning/Analysis-of-Deep-Perceptual-Loss-Networks