A Systematic Performance Analysis of Deep Perceptual Loss Networks Breaks Transfer Learning Conventions

Deep perceptual loss is a type of loss function in computer vision that aims to mimic human perception by using the deep features extracted from neural networks. In recent years the method has been applied to great effect on a host of interesting computer vision tasks, especially for tasks with image or image-like outputs. Many applications of the method use pretrained networks, often convolutional networks, for loss calculation. Despite the increased interest and broader use, more effort is needed toward exploring which networks to use for calculating deep perceptual loss and from which layers to extract the features. This work aims to rectify this by systematically evaluating a host of commonly used and readily available, pretrained networks for a number of different feature extraction points on four existing use cases of deep perceptual loss. The four use cases are implementations of previous works where the selected networks and extraction points are evaluated instead of the networks and extraction points used in the original work. The experimental tasks are dimensionality reduction, image segmentation, super-resolution, and perceptual similarity. The performance on these four tasks, attributes of the networks, and extraction points are then used as a basis for an in-depth analysis. This analysis uncovers essential information regarding which architectures provide superior performance for deep perceptual loss and how to choose an appropriate extraction point for a particular task and dataset. Furthermore, the work discusses the implications of the results for deep perceptual loss and the broader field of transfer learning. The results break commonly held assumptions in transfer learning, which imply that deep perceptual loss deviates from most transfer learning settings or that these assumptions need a thorough re-evaluation.

翻译：深层感知损失是计算机视觉中的一种损失函数，旨在通过利用神经网络提取的深层特征来模拟人类感知。近年来，该方法已成功应用于一系列有趣的计算机视觉任务，特别是涉及图像或类图像输出的任务。该方法的许多应用使用预训练网络（通常是卷积网络）进行损失计算。尽管该方法受到越来越多的关注并得到更广泛的应用，但仍需进一步探索用于计算深层感知损失的网络以及应从哪些层提取特征。本研究旨在通过系统评估一系列常用且易获取的预训练网络在多个不同特征提取点上的表现来弥补这一不足，这些评估基于深层感知损失的四个现有应用案例。这四个案例是先前工作的实现，其中评估所选网络和提取点，而非原始工作中使用的网络和提取点。实验任务包括降维、图像分割、超分辨率重建和感知相似性。基于这四个任务的性能、网络属性以及提取点进行深入分析。该分析揭示了关键信息，包括哪些架构能为深层感知损失提供更优性能，以及如何针对特定任务和数据集选择合适的提取点。此外，本研究讨论了这些结果对深层感知损失以及更广泛的迁移学习领域的启示。这些结果打破了迁移学习中普遍持有的假设，表明深层感知损失与大多数迁移学习设置存在差异，或这些假设需要彻底重新评估。