Explicitly modeling room background depth as a geometric constraint has proven effective for panoramic depth estimation. However, reconstructing this background depth for regular enclosed regions in a complex indoor scene without external measurements remains an open challenge. To address this, we propose a pose-aware and geometry-constrained framework for panoramic depth estimation. Our framework first employs multiple task-specific decoders to jointly estimate room layout, camera pose, depth, and region segmentation from a input panoramic image. A pose-aware background depth resolving (PA-BDR) component uses tasks decoder's prediction to resolve the camera pose. Subsequently, the proposed PA-BDR component uses the camera pose to compute the background depth of regular enclosed regions and uses this background depth as a strong geometric prior. Based on the output of the region segmentation decoder, a fusion mask generation (FMG) component produces a fusion weight map to guide where and to what extent the geometry-constrained background depth should correct the depth decoder's prediction. Finally, an adaptive fusion component integrates this refined background depth with the initial depth prediction, guided by the fusion weight. Extensive experiments on Matterport3D, Structured3D, and Replica datasets demonstrate that our method achieves significantly superior performance compared to current open-source methods. Code is available at https://github.com/emiyaning/PAGCNet.
翻译:显式地将房间背景深度建模为几何约束已被证明对全景深度估计是有效的。然而,在复杂室内场景中,无需外部测量即可重建规则封闭区域的背景深度,仍然是一个开放的挑战。为解决此问题,我们提出了一种用于全景深度估计的姿态感知与几何约束框架。我们的框架首先采用多个任务特定解码器,从输入的全景图像中联合估计房间布局、相机姿态、深度和区域分割。一个姿态感知背景深度求解组件利用任务解码器的预测来求解相机姿态。随后,该组件使用相机姿态计算规则封闭区域的背景深度,并将此背景深度用作强几何先验。基于区域分割解码器的输出,一个融合掩码生成组件产生融合权重图,以指导几何约束的背景深度应在何处以及何种程度上修正深度解码器的预测。最后,一个自适应融合组件在融合权重的引导下,将此精化的背景深度与初始深度预测进行融合。在Matterport3D、Structured3D和Replica数据集上的大量实验表明,与当前开源方法相比,我们的方法实现了显著更优的性能。代码可在 https://github.com/emiyaning/PAGCNet 获取。