A Systematic Performance Analysis of Deep Perceptual Loss Networks: Breaking Transfer Learning Conventions

Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years, the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs, such as image synthesis, segmentation, depth prediction, and more. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The use cases of perceptual similarity, super-resolution, image segmentation, and dimensionality reduction, are evaluated through benchmarks. The benchmarks are implementations of previous works where the selected networks and extraction points are evaluated. The performance on the benchmarks, and attributes of the networks and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers insight regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results show that deep perceptual loss deviates from two commonly held conventions in transfer learning, which suggests that those conventions are in need of deeper analysis.

翻译：深度感知损失是一种计算机视觉中的损失函数，旨在通过利用神经网络中提取的深度特征来模仿人类感知。近年来，该方法已成功应用于一系列有趣的计算机视觉任务，尤其是那些输出为图像或类似图像的任务，例如图像合成、分割、深度预测等。该方法的应用通常使用预训练网络（通常是卷积网络）进行损失计算。尽管该方法受到越来越多的关注并得到更广泛的应用，但仍需进一步探索哪种网络可用于计算深度感知损失，以及从哪些层提取特征。本研究旨在通过系统地评估一系列常用且易于获取的预训练网络，针对深度感知损失的四个现有用例，从不同的特征提取点进行评估来纠正这一问题。这些用例包括感知相似性、超分辨率、图像分割和降维，它们均通过基准测试进行评估。这些基准测试是先前工作的实现，在其中评估所选的网络和提取点。随后，基于基准测试的性能以及网络和提取点的属性进行深入分析。该分析揭示了哪些架构在深度感知损失中提供优越性能，以及如何为特定任务和数据集选择适当的提取点。此外，本研究还讨论了这些结果对深度感知损失及更广泛的迁移学习领域的影响。结果表明，深度感知损失偏离了迁移学习中的两个常见惯例，这表明这些惯例需要更深入的分析。