Recent years have seen a tremendous growth in both the capability and popularity of automatic machine analysis of images and video. As a result, a growing need for efficient compression methods optimized for machine vision, rather than human vision, has emerged. To meet this growing demand, several methods have been developed for image and video coding for machines. Unfortunately, while there is a substantial body of knowledge regarding rate-distortion theory for human vision, the same cannot be said of machine analysis. In this paper, we extend the current rate-distortion theory for machines, providing insight into important design considerations of machine-vision codecs. We then utilize this newfound understanding to improve several methods for learnable image coding for machines. Our proposed methods achieve state-of-the-art rate-distortion performance on several computer vision tasks such as classification, instance segmentation, and object detection.
翻译:近年来,自动机器分析图像与视频的能力和普及度均取得了巨大增长。因此,针对机器视觉而非人类视觉优化的高效压缩方法需求日益迫切。为满足这一需求,若干面向机器的图像与视频编码方法已被开发。然而,尽管人类视觉的率失真理论已有丰富知识体系,机器分析领域却未及如此。本文扩展了当前面向机器的率失真理论,为机器视觉编解码器的重要设计考量提供了深刻见解。继而,我们利用这一新认知改进了若干面向机器的可学习图像编码方法。所提方法在分类、实例分割及目标检测等多项计算机视觉任务中实现了先进的率失真性能。