We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%. We also evaluate the performance on real-world images and demonstrate the benefits.
翻译:本文提出一种名为LoLep的新方法,该方法从单张RGB图像回归局部学习平面(Locally-Learned planes)以精确表示场景,从而生成更优的新视角图像。在缺乏深度信息的情况下,回归合理的平面位置是一个具有挑战性的问题。为解决此问题,我们将视差空间预先划分为多个区间,并设计了一个视差采样器,用于在每个区间内回归多个平面的局部偏移量。然而,仅使用该采样器会导致网络无法收敛;为此,我们进一步提出两种优化策略,结合不同数据集的视差分布特性,并引入一种遮挡感知的重投影损失作为简单而有效的几何监督技术。此外,我们引入自注意力机制以改善遮挡推断,并提出块采样自注意力模块(Block-Sampling Self-Attention, BS-SA),解决自注意力在大尺度特征图上的应用难题。我们在不同数据集上验证了方法的有效性,并取得了最先进的结果。与MINE方法相比,我们的方法在LPIPS指标上降低4.8%-9.0%,在RV指标上降低73.9%-83.5%。我们还评估了其在真实世界图像上的性能,并展示了其优势。