Image coding for machines (ICM) aims to compress images for machine analysis using recognition models rather than human vision. Hence, in ICM, it is important for the encoder to recognize and compress the information necessary for the machine recognition task. There are two main approaches in learned ICM; optimization of the compression model based on task loss, and Region of Interest (ROI) based bit allocation. These approaches provide the encoder with the recognition capability. However, optimization with task loss becomes difficult when the recognition model is deep, and ROI-based methods often involve extra overhead during evaluation. In this study, we propose a novel training method for learned ICM models that applies auxiliary loss to the encoder to improve its recognition capability and rate-distortion performance. Our method achieves Bjontegaard Delta rate improvements of 27.7% and 20.3% in object detection and semantic segmentation tasks, compared to the conventional training method.
翻译:面向机器的图像编码(ICM)旨在利用识别模型而非人类视觉系统对图像进行压缩以供机器分析。因此,在ICM中,编码器需识别并压缩机器识别任务所需的信息。当前学习的ICM方法主要有两种:基于任务损失的压缩模型优化,以及基于感兴趣区域(ROI)的比特分配。这些方法使编码器具备识别能力。然而,当识别模型深度较大时,基于任务损失的优化变得困难,且基于ROI的方法通常在评估阶段引入额外开销。本研究提出一种新颖的学习ICM模型训练方法,通过对编码器施加辅助损失以提升其识别能力与率失真性能。与传统训练方法相比,本方法在目标检测和语义分割任务中分别实现了27.7%和20.3%的Bjontegaard Delta码率节省。