We present methods for conditional and residual coding in the context of scalable coding for humans and machines. Our focus is on optimizing the rate-distortion performance of the reconstruction task using the information available in the computer vision task. We include an information analysis of both approaches to provide baselines and also propose an entropy model suitable for conditional coding with increased modelling capacity and similar tractability as previous work. We apply these methods to image reconstruction, using, in one instance, representations created for semantic segmentation on the Cityscapes dataset, and in another instance, representations created for object detection on the COCO dataset. In both experiments, we obtain similar performance between the conditional and residual methods, with the resulting rate-distortion curves contained within our baselines.
翻译:我们提出了面向人类与机器的可扩展编码中条件与残差编码的方法。重点在于利用计算机视觉任务中的可用信息优化重建任务的率失真性能。我们针对两种方法进行了信息分析以提供基准,同时提出了一种适用于条件编码的熵模型,该模型在保持与先前工作相当的可处理性的同时,提高了建模能力。我们将这些方法应用于图像重建:其中一组实验采用为Cityscapes数据集语义分割任务生成的表征,另一组实验采用为COCO数据集目标检测任务生成的表征。在两组实验中,条件方法与残差方法均表现出相近的性能,得到的率失真曲线均包含在我们设定的基准范围内。