We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.
翻译:本文提出了在人机可扩展编码框架下的条件编码与残差编码方法。我们重点研究利用计算机视觉任务中可用信息优化重建任务的率失真性能。通过信息论分析为两种方法建立基准,并提出一种适用于条件编码的熵模型,该模型在保持与先前工作相似的可处理性的同时提升了建模能力。我们将这些方法应用于图像重建,分别基于Cityscapes数据集的语义分割表示和COCO数据集的目标检测表示进行实验。两项实验表明,条件编码与残差编码方法性能相当,其率失真曲线均落在基准范围内。