DNN-Compressed Domain Visual Recognition with Feature Adaptation

Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.

翻译：基于学习的图像压缩已被证明能够与最先进的基于变换的编解码器达到相当的性能。这一进展推动了诸如JPEG-AI等新型学习型视觉压缩标准的开发。这些新兴标准特别关注面向人类与机器双重目标的学习型图像压缩系统。本文研究的是能够直接在压缩域中利用其压缩域表示进行视觉处理和计算机视觉任务的学习型压缩方案。在我们的工作中，我们采用了一种基于学习的压缩域分类框架，利用不同比特率下的压缩域潜在表示进行视觉识别。我们提出了一种新颖的特征自适应模块，该模块集成了一个轻量级注意力模型，以自适应地强调和增强提取的通道信息中的关键特征。此外，我们设计了一种自适应训练策略，以利用预训练的像素域权重。为了进行比较，除了使用我们提出的基于潜在表示的压缩域方法获得的性能结果外，我们还展示了使用压缩但完全解码的像素域图像以及原始未压缩图像的性能结果。所获得的性能结果表明，我们提出的压缩域分类模型能够显著优于现有的压缩域分类模型，并且与使用完全解码图像训练的像素域模型相比，能在保持相似准确率的同时实现更高的计算效率。