FFCA-Net: Stereo Image Compression via Fast Cascade Alignment of Side Information

Multi-view compression technology, especially Stereo Image Compression (SIC), plays a crucial role in car-mounted cameras and 3D-related applications. Interestingly, the Distributed Source Coding (DSC) theory suggests that efficient data compression of correlated sources can be achieved through independent encoding and joint decoding. This motivates the rapidly developed deep-distributed SIC methods in recent years. However, these approaches neglect the unique characteristics of stereo-imaging tasks and incur high decoding latency. To address this limitation, we propose a Feature-based Fast Cascade Alignment network (FFCA-Net) to fully leverage the side information on the decoder. FFCA adopts a coarse-to-fine cascaded alignment approach. In the initial stage, FFCA utilizes a feature domain patch-matching module based on stereo priors. This module reduces redundancy in the search space of trivial matching methods and further mitigates the introduction of noise. In the subsequent stage, we utilize an hourglass-based sparse stereo refinement network to further align inter-image features with a reduced computational cost. Furthermore, we have devised a lightweight yet high-performance feature fusion network, called a Fast Feature Fusion network (FFF), to decode the aligned features. Experimental results on InStereo2K, KITTI, and Cityscapes datasets demonstrate the significant superiority of our approach over traditional and learning-based SIC methods. In particular, our approach achieves significant gains in terms of 3 to 10-fold faster decoding speed than other methods.

翻译：多视角压缩技术，尤其是立体图像压缩（SIC），在车载摄像头和三维相关应用中发挥着关键作用。有趣的是，分布式信源编码（DSC）理论表明，通过独立编码与联合解码可以实现相关信源的高效数据压缩。这推动了近年来快速发展的深度分布式SIC方法。然而，这些方法忽视了立体成像任务的独特性，并导致较高的解码延迟。为解决这一局限，我们提出了一种基于特征的快速级联对齐网络（FFCA-Net），以在解码端充分利用侧信息。FFCA采用从粗到细的级联对齐策略。在初始阶段，FFCA利用基于立体先验的特征域块匹配模块，该模块降低了琐碎匹配方法在搜索空间中的冗余，并进一步抑制了噪声的引入。在后续阶段，我们采用基于沙漏结构的稀疏立体精化网络，以较低的计算成本进一步对齐图像间特征。此外，我们设计了一种轻量级且高性能的特征融合网络，称为快速特征融合网络（FFF），用于解码对齐后的特征。在InStereo2K、KITTI和Cityscapes数据集上的实验结果表明，我们的方法相较于传统及基于学习的SIC方法具有显著优势。特别地，我们的方法在解码速度上较其他方法实现了3至10倍的显著提升。