With the growing demand for immersive digital applications, the need to understand and reconstruct 3D scenes has significantly increased. In this context, inpainting indoor environments from a single image plays a crucial role in modeling the internal structure of interior spaces as it enables the creation of textured and clutter-free reconstructions. While recent methods have shown significant progress in room modeling, they rely on constraining layout estimators to guide the reconstruction process. These methods are highly dependent on the performance of the structure estimator and its generative ability in heavily occluded environments. In response to these issues, we propose an innovative approach based on a U-Former architecture and a new Windowed-FourierMixer block, resulting in a unified, single-phase network capable of effectively handle human-made periodic structures such as indoor spaces. This new architecture proves advantageous for tasks involving indoor scenes where symmetry is prevalent, allowing the model to effectively capture features such as horizon/ceiling height lines and cuboid-shaped rooms. Experiments show the proposed approach outperforms current state-of-the-art methods on the Structured3D dataset demonstrating superior performance in both quantitative metrics and qualitative results. Code and models will be made publicly available.
翻译:随着沉浸式数字应用需求的日益增长,三维场景理解与重建的重要性显著提升。在此背景下,基于单张图像的室内场景修复在内部空间结构建模中扮演关键角色,因为它能够生成带有纹理且无杂乱的场景重建。尽管现有方法在室内建模方面取得了显著进展,但它们依赖于约束性的布局估计算法来引导重建过程。这些方法高度依赖结构估计器的性能及其在严重遮挡环境中的生成能力。针对这些问题,我们提出了一种创新方法,该方法基于U形变换器架构与新型窗口傅里叶混合器模块,构建统一的单阶段网络,能够有效处理室内空间等人造周期性结构。这种新型架构在对称性较强的室内场景任务中展现出优势,使模型能高效捕捉水平线/天花板高度线以及长方体房间等特征。实验结果表明,所提方法在Structured3D数据集上超越了当前最先进方法,在定量指标与定性结果上均表现更优。代码与模型将公开发布。