The Multiplane Image (MPI), containing a set of fronto-parallel RGBA layers, is an effective and efficient representation for view synthesis from sparse inputs. Yet, its fixed structure limits the performance, especially for surfaces imaged at oblique angles. We introduce the Structural MPI (S-MPI), where the plane structure approximates 3D scenes concisely. Conveying RGBA contexts with geometrically-faithful structures, the S-MPI directly bridges view synthesis and 3D reconstruction. It can not only overcome the critical limitations of MPI, i.e., discretization artifacts from sloped surfaces and abuse of redundant layers, and can also acquire planar 3D reconstruction. Despite the intuition and demand of applying S-MPI, great challenges are introduced, e.g., high-fidelity approximation for both RGBA layers and plane poses, multi-view consistency, non-planar regions modeling, and efficient rendering with intersected planes. Accordingly, we propose a transformer-based network based on a segmentation model. It predicts compact and expressive S-MPI layers with their corresponding masks, poses, and RGBA contexts. Non-planar regions are inclusively handled as a special case in our unified framework. Multi-view consistency is ensured by sharing global proxy embeddings, which encode plane-level features covering the complete 3D scenes with aligned coordinates. Intensive experiments show that our method outperforms both previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods.
翻译:多平面图像(MPI)包含一组正面平行的RGBA图层,是一种从稀疏输入进行视图合成的高效表示方法。然而,其固定结构限制了性能,尤其对于斜角度成像的表面。我们提出结构多平面图像(S-MPI),其中平面结构简洁地近似三维场景。通过传递具有几何保真结构的RGBA上下文,S-MPI直接桥接了视图合成与三维重建。它不仅克服了MPI的关键局限(即斜面离散化伪影和冗余图层的滥用),还能获取平面三维重建。尽管应用S-MPI具有直观性和需求性,但仍面临重大挑战,例如RGBA图层和平面姿态的高保真近似、多视图一致性、非平面区域建模以及交叉平面的高效渲染。据此,我们提出基于分割模型的Transformer网络。它预测紧凑且表达力强的S-MPI图层,并附带相应掩码、姿态和RGBA上下文。非平面区域作为统一框架的特例被包容处理。通过共享全局代理嵌入(编码覆盖完整三维场景且坐标对齐的平面级特征)确保多视图一致性。大量实验表明,我们的方法优于之前最先进的基于MPI的视图合成方法和平面重建方法。