We identify an issue in multi-task learnable compression, in which a representation learned for one task does not positively contribute to the rate-distortion performance of a different task as much as expected, given the estimated amount of information available in it. We interpret this issue using the predictive $\mathcal{V}$-information framework. In learnable scalable coding, previous work increased the utilization of side-information for input reconstruction by also rewarding input reconstruction when learning this shared representation. We evaluate the impact of this idea in the context of input reconstruction more rigorously and extended it to other computer vision tasks. We perform experiments using representations trained for object detection on COCO 2017 and depth estimation on the Cityscapes dataset, and use them to assist in image reconstruction and semantic segmentation tasks. The results show considerable improvements in the rate-distortion performance of the assisted tasks. Moreover, using the proposed representations, the performance of the base tasks are also improved. Results suggest that the proposed method induces simpler representations that are more compatible with downstream processes.
翻译:我们识别了多任务可学习压缩中的一个问题:即针对某一任务学习到的表示,在辅助另一任务时,其率失真性能的提升并未达到基于其估计信息量所预期的水平。我们利用预测性$\mathcal{V}$-信息框架对该问题进行了阐释。在可学习可分级编码中,先前的研究通过在共享表示学习过程中同时奖励输入重建,提高了侧信息在输入重建中的利用率。我们更严格地评估了这一思路在输入重建中的影响,并将其扩展至其他计算机视觉任务。我们使用在COCO 2017数据集上为物体检测训练和在Cityscapes数据集上为深度估计训练的表示进行了实验,并将其用于辅助图像重建和语义分割任务。结果表明,辅助任务的率失真性能得到了显著提升。此外,使用所提出的表示后,基础任务的性能也有所改善。实验结果说明,所提出的方法能够诱导出更简单、更兼容下游处理的表示。