We identify an issue in multi-task learnable compression, in which a representation learned for one task does not positively contribute to the rate-distortion performance of a different task as much as expected, given the estimated amount of information available in it. We interpret this issue using the predictive $\mathcal{V}$-information framework. In learnable scalable coding, previous work increased the utilization of side-information for input reconstruction by also rewarding input reconstruction when learning this shared representation. We evaluate the impact of this idea in the context of input reconstruction more rigorously and extended it to other computer vision tasks. We perform experiments using representations trained for object detection on COCO 2017 and depth estimation on the Cityscapes dataset, and use them to assist in image reconstruction and semantic segmentation tasks. The results show considerable improvements in the rate-distortion performance of the assisted tasks. Moreover, using the proposed representations, the performance of the base tasks are also improved. Results suggest that the proposed method induces simpler representations that are more compatible with downstream processes.
翻译:我们发现了多任务可学习压缩中的一个问题:为某一任务学习的表示对其他任务的率失真性能提升程度,并未达到基于其估计信息量所预期的水平。我们使用预测性$\mathcal{V}$-信息框架对此问题进行解释。在可扩展编码学习中,先前工作通过在学习共享表示时同时优化输入重建目标,提升了辅助信息在输入重建中的利用率。我们更严格地评估了该思路对输入重建任务的影响,并将其扩展到其他计算机视觉任务。我们使用在COCO 2017数据集上训练的目标检测表示和在Cityscapes数据集上训练的深度估计表示进行实验,并利用它们辅助图像重建与语义分割任务。实验结果表明,辅助任务的率失真性能获得显著提升。此外,使用所提表示时,基础任务的性能也得到改善。结果证明,所提方法能诱导生成更简单且与下游处理更兼容的表示。