In this paper, we present ECSIC, a novel learned method for stereo image compression. Our proposed method compresses the left and right images in a joint manner by exploiting the mutual information between the images of the stereo image pair using a novel stereo cross attention (SCA) module and two stereo context modules. The SCA module performs cross-attention restricted to the corresponding epipolar lines of the two images and processes them in parallel. The stereo context modules improve the entropy estimation of the second encoded image by using the first image as a context. We conduct an extensive ablation study demonstrating the effectiveness of the proposed modules and a comprehensive quantitative and qualitative comparison with existing methods. ECSIC achieves state-of-the-art performance in stereo image compression on the two popular stereo image datasets Cityscapes and InStereo2k while allowing for fast encoding and decoding.
翻译:本文提出ECSIC,一种新型的立体图像压缩学习方法。我们的方法通过新型立体交叉注意力(SCA)模块和两个立体上下文模块,利用立体图像对间的互信息,联合压缩左右图像。SCA模块执行限制于两幅图像对应极线的交叉注意力,并对其进行并行处理。立体上下文模块通过将第一幅图像作为上下文来改进第二幅编码图像的熵估计。我们进行了广泛的消融研究,证明了所提模块的有效性,并与现有方法进行了全面的定量与定性比较。ECSIC在Cityscapes和InStereo2k这两个流行的立体图像数据集上实现了立体图像压缩的最先进性能,同时支持快速编码与解码。