Lossy image coding standards such as JPEG and MPEG have successfully achieved high compression rates for human consumption of multimedia data. However, with the increasing prevalence of IoT devices, drones, and self-driving cars, machines rather than humans are processing a greater portion of captured visual content. Consequently, it is crucial to pursue an efficient compressed representation that caters not only to human vision but also to image processing and machine vision tasks. Drawing inspiration from the efficient coding hypothesis in biological systems and the modeling of the sensory cortex in neural science, we repurpose the compressed latent representation to prioritize semantic relevance while preserving perceptual distance. Our proposed method, Compressed Perceptual Image Patch Similarity (CPIPS), can be derived at a minimal cost from a learned neural codec and computed significantly faster than DNN-based perceptual metrics such as LPIPS and DISTS.
翻译:诸如JPEG和MPEG等有损图像编码标准已成功为人类消费多媒体数据实现高压缩率。然而,随着物联网设备、无人机和自动驾驶汽车的日益普及,处理获取的视觉内容的主体正逐渐从人类转向机器。因此,追求一种不仅服务于人类视觉,同时也适用于图像处理和机器视觉任务的高效压缩表示至关重要。受生物系统中高效编码假说以及神经科学中感觉皮层建模的启发,我们重新调整压缩潜表示的目标,使其在保持感知距离的同时优先考虑语义相关性。我们提出的方法——压缩感知图像块相似度(CPIPS),可以以极低的成本从学习型神经编解码器中推导得出,其计算速度显著快于基于深度神经网络的感知度量(如LPIPS和DISTS)。