Traditional human vision-centric image compression methods are suboptimal for machine vision centric compression due to different visual properties and feature characteristics. To address this problem, we propose a Channel Importance-driven learned Image Coding for Machines (CI-ICM), aiming to maximize the performance of machine vision tasks at a given bitrate constraint. First, we propose a Channel Importance Generation (CIG) module to quantify channel importance in machine vision and develop a channel order loss to rank channels in descending order. Second, to properly allocate bitrate among feature channels, we propose a Feature Channel Grouping and Scaling (FCGS) module that non-uniformly groups the feature channels based on their importance and adjusts the dynamic range of each group. Based on FCGS, we further propose a Channel Importance-based Context (CI-CTX) module to allocate bits among feature groups and to preserve higher fidelity in critical channels. Third, to adapt to multiple machine tasks, we propose a Task-Specific Channel Adaptation (TSCA) module to adaptively enhance features for multiple downstream machine tasks. Experimental results on the COCO2017 dataset show that the proposed CI-ICM achieves BD-mAP@50:95 gains of 16.25$\%$ in object detection and 13.72$\%$ in instance segmentation over the established baseline codec. Ablation studies validate the effectiveness of each contribution, and computation complexity analysis reveals the practicability of the CI-ICM. This work establishes feature channel optimization for machine vision-centric compression, bridging the gap between image coding and machine perception.
翻译:传统面向人类视觉的图像压缩方法由于视觉特性和特征表征的差异,难以满足机器视觉压缩的需求。针对此问题,本文提出了一种基于通道重要性的机器视觉图像编码方法(CI-ICM),旨在给定码率约束下最大化机器视觉任务的性能。首先,我们提出通道重要性生成模块(CIG),用于量化机器视觉中的通道重要性,并设计通道排序损失函数以对通道进行降序排列。其次,为合理分配特征通道间的码率,我们提出特征通道分组与缩放模块(FCGS),该模块根据通道重要性对特征通道进行非均匀分组,并调整各组的动态范围。基于FCGS,我们进一步提出通道重要性上下文模块(CI-CTX),用于在特征组间分配码率,并优先保留关键通道的高保真度。再次,为适应多类机器任务,我们提出任务特定通道自适应模块(TSCA),以自适应增强面向下游机器任务的特征。在COCO2017数据集上的实验结果表明,所提CI-ICM方法在目标检测和实例分割任务上,相较于基准编码器分别实现了16.25%和13.72%的BD-mAP@50:95增益。消融实验验证了各贡献模块的有效性,计算复杂度分析揭示了CI-ICM的实用性。本工作建立了面向机器视觉压缩的特征通道优化机制,弥合了图像编码与机器感知之间的鸿沟。