Image compression constitutes a significant challenge amidst the era of information explosion. Recent studies employing deep learning methods have demonstrated the superior performance of learning-based image compression methods over traditional codecs. However, an inherent challenge associated with these methods lies in their lack of interpretability. Following an analysis of the varying degrees of compression degradation across different frequency bands, we propose the end-to-end optimized image compression model facilitated by the frequency-oriented transform. The proposed end-to-end image compression model consists of four components: spatial sampling, frequency-oriented transform, entropy estimation, and frequency-aware fusion. The frequency-oriented transform separates the original image signal into distinct frequency bands, aligning with the human-interpretable concept. Leveraging the non-overlapping hypothesis, the model enables scalable coding through the selective transmission of arbitrary frequency components. Extensive experiments are conducted to demonstrate that our model outperforms all traditional codecs including next-generation standard H.266/VVC on MS-SSIM metric. Moreover, visual analysis tasks (i.e., object detection and semantic segmentation) are conducted to verify the proposed compression method could preserve semantic fidelity besides signal-level precision.
翻译:图像压缩是信息爆炸时代面临的一项重大挑战。近年来,采用深度学习的方法已展现出基于学习的图像压缩技术相较于传统编解码器的优越性能。然而,这些方法固有的难题在于缺乏可解释性。通过分析不同频段压缩退化的不同程度,我们提出了由面向频率变换驱动的端到端优化图像压缩模型。该模型包含四个组成部分:空间采样、面向频率变换、熵估计和频率感知融合。其中,面向频率变换可将原始图像信号分离至不同频段,这符合人类可理解的抽象概念。基于非重叠假设,该模型通过选择性传输任意频率分量实现了可伸缩编码。大量实验表明,在MS-SSIM指标上,我们的模型性能优于包括下一代标准H.266/VVC在内的所有传统编解码器。此外,通过视觉分析任务(即目标检测与语义分割)验证了所提压缩方法在保持信号级精度的同时,也能保留语义保真度。