Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis. However, existing codecs are primarily optimized for pixel-domain and HVS-perception metrics rather than the needs of machine vision tasks. To address this issue, we propose a Compression Distortion Representation Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models, addressing the information lost during compression and improving task performance. Specifically, to better analyze the machine-perception-related distortion, we design a compression-sensitive extractor that identifies compression degradation in the feature domain. For efficient transmission, a lightweight distortion codec is introduced to compress the distortion information into a compact representation. Subsequently, the representation is progressively embedded into the downstream model, enabling it to be better informed about compression degradation and enhancing performance. Experiments across various codecs and downstream tasks demonstrate that our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of bitrate, execution time, and number of parameters. Our codes and supplementary materials are released in https://github.com/Ws-Syx/CDRE/.
翻译:当前,视频传输不仅服务于人类视觉系统的观看需求,也服务于机器感知的分析需求。然而,现有的编解码器主要针对像素域和人类视觉感知指标进行优化,而非针对机器视觉任务的需求。为解决这一问题,我们提出了一种压缩失真表征嵌入框架,该框架提取与机器感知相关的失真表征,并将其嵌入下游模型中,以应对压缩过程中丢失的信息并提升任务性能。具体而言,为更好地分析与机器感知相关的失真,我们设计了一个压缩敏感提取器,用于在特征域中识别压缩退化。为实现高效传输,引入了一个轻量级失真编解码器,将失真信息压缩为紧凑表征。随后,该表征被逐步嵌入下游模型,使模型能更好地感知压缩退化,从而提升性能。在不同编解码器和下游任务上的实验表明,我们的框架能够以极低的码率、执行时间和参数数量开销,有效提升现有编解码器的率-任务性能。我们的代码与补充材料发布于 https://github.com/Ws-Syx/CDRE/。