LoLep：基于局部学习平面与自注意力遮挡推断的单视图视角合成 (LoLep: Single-View View Synthesis with Locally-Learned Planes and Self-Attention Occlusion Inference)

We propose a novel method, LoLep, which regresses Locally-Learned planes from a single RGB image to represent scenes accurately, thus generating better novel views. Without the depth information, regressing appropriate plane locations is a challenging problem. To solve this issue, we pre-partition the disparity space into bins and design a disparity sampler to regress local offsets for multiple planes in each bin. However, only using such a sampler makes the network not convergent; we further propose two optimizing strategies that combine with different disparity distributions of datasets and propose an occlusion-aware reprojection loss as a simple yet effective geometric supervision technique. We also introduce a self-attention mechanism to improve occlusion inference and present a Block-Sampling Self-Attention (BS-SA) module to address the problem of applying self-attention to large feature maps. We demonstrate the effectiveness of our approach and generate state-of-the-art results on different datasets. Compared to MINE, our approach has an LPIPS reduction of 4.8%-9.0% and an RV reduction of 73.9%-83.5%. We also evaluate the performance on real-world images and demonstrate the benefits.

翻译：我们提出了一种新颖的方法LoLep，该方法从单张RGB图像回归局部学习平面以精确表示场景，从而生成更优的新视角。在缺乏深度信息的情况下，回归合适的平面位置是一个具有挑战性的问题。为解决此问题，我们预先将视差空间划分为多个区间，并设计了一个视差采样器来回归每个区间内多个平面的局部偏移量。然而，仅使用此类采样器会导致网络无法收敛；我们进一步提出了两种优化策略，结合数据集中不同的视差分布，并提出遮挡感知重投影损失作为一种简单而有效的几何监督技术。我们还引入了自注意力机制以改进遮挡推断，并提出块采样自注意力模块来解决将自注意力应用于大型特征图时产生的问题。我们在不同数据集上验证了本方法的有效性，并取得了最先进的结果。与MINE相比，我们的方法在LPIPS指标上降低了4.8%-9.0%，在RV指标上降低了73.9%-83.5%。我们还在真实世界图像上评估了性能，并证明了其优势。

相关内容

自注意力

关注 13

利用注意力机制来“动态”地生成不同连接的权重，这就是自注意力模型（Self-Attention Model）. 注意力机制模仿了生物观察行为的内部过程，即一种将内部经验和外部感觉对齐从而增加部分区域的观察精细度的机制。注意力机制可以快速提取稀疏数据的重要特征，因而被广泛用于自然语言处理任务，特别是机器翻译。而自注意力机制是注意力机制的改进，其减少了对外部信息的依赖，更擅长捕捉数据或特征的内部相关性