We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%. We also evaluate the performance on real-world images and demonstrate the benefits.
翻译:我们提出了一种新颖的方法LoLep,该方法从单张RGB图像回归局部学习平面以精确表示场景,从而生成更优的新视角。在缺乏深度信息的情况下,回归合适的平面位置是一个具有挑战性的问题。为解决此问题,我们预先将视差空间划分为多个区间,并设计了一个视差采样器来回归每个区间内多个平面的局部偏移量。然而,仅使用此类采样器会导致网络无法收敛;我们进一步提出了两种优化策略,结合数据集中不同的视差分布,并提出遮挡感知重投影损失作为一种简单而有效的几何监督技术。我们还引入了自注意力机制以改进遮挡推断,并提出块采样自注意力模块来解决将自注意力应用于大型特征图时产生的问题。我们在不同数据集上验证了本方法的有效性,并取得了最先进的结果。与MINE相比,我们的方法在LPIPS指标上降低了4.8%-9.0%,在RV指标上降低了73.9%-83.5%。我们还在真实世界图像上评估了性能,并证明了其优势。