The introduction of multiple viewpoints in video scenes inevitably increases the bitrates required for storage and transmission. To reduce bitrates, researchers have developed methods to skip intermediate viewpoints during compression and delivery, and ultimately reconstruct them using Side Information (SI). Typically, depth maps are used to construct SI. However, their methods suffer from inaccuracies in reconstruction and inherently high bitrates. In this paper, we propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adversarial Network (GAN) to improve the reconstruction accuracy of SI. Additionally, we consider incorporating information from adjacent temporal and spatial viewpoints to further reduce SI redundancy. At the encoder, we construct a spatio-temporal Epipolar Plane Image (EPI) and further utilize a convolutional network to extract the latent code of a GAN as SI. At the decoder side, we combine the SI and adjacent viewpoints to reconstruct intermediate views using the GAN generator. Specifically, we establish a joint encoder constraint for reconstruction cost and SI entropy to achieve an optimal trade-off between reconstruction quality and bitrates overhead. Experiments demonstrate significantly improved Rate-Distortion (RD) performance compared with state-of-the-art methods.
翻译:视频场景中引入多个视点不可避免地增加了存储和传输所需的比特率。为降低比特率,研究者开发了在压缩和传输过程中跳过中间视点,并最终利用边信息(SI)重建视点的方法。传统方法通常使用深度图构建边信息,但存在重建精度不足且本身比特率较高的问题。本文提出一种新型多视点视频编码方法,利用生成对抗网络(GAN)的图像生成能力提升边信息重建精度。同时,我们考虑融合相邻时域和空域视点信息以进一步减少边信息冗余。在编码端构建时空极平面图像(EPI),并通过卷积网络提取生成对抗网络的潜在码作为边信息。解码端结合边信息与相邻视点,利用生成对抗网络的生成器重建中间视点。具体而言,我们建立联合编码器约束以权衡重建代价与边信息熵,实现重建质量与比特率开销的最优平衡。实验表明,相较于现有最先进方法,本方法在率失真(RD)性能上取得显著提升。