Segmentation of planar regions from a single RGB image is a particularly important task in the perception of complex scenes. To utilize both visual and geometric properties in images, recent approaches often formulate the problem as a joint estimation of planar instances and dense depth through feature fusion mechanisms and geometric constraint losses. Despite promising results, these methods do not consider cross-task feature distillation and perform poorly in boundary regions. To overcome these limitations, we propose X-PDNet, a framework for the multitask learning of plane instance segmentation and depth estimation with improvements in the following two aspects. Firstly, we construct the cross-task distillation design which promotes early information sharing between dual-tasks for specific task improvements. Secondly, we highlight the current limitations of using the ground truth boundary to develop boundary regression loss, and propose a novel method that exploits depth information to support precise boundary region segmentation. Finally, we manually annotate more than 3000 images from Stanford 2D-3D-Semantics dataset and make available for evaluation of plane instance segmentation. Through the experiments, our proposed methods prove the advantages, outperforming the baseline with large improvement margins in the quantitative results on the ScanNet and the Stanford 2D-3D-S dataset, demonstrating the effectiveness of our proposals.
翻译:从单张RGB图像中分割平面区域是复杂场景感知中尤为重要的任务。为利用图像中的视觉与几何属性,近期方法通常将问题表述为通过特征融合机制与几何约束损失联合估计平面实例与稠密深度。尽管取得了令人鼓舞的结果,这些方法未考虑跨任务特征蒸馏,且在边界区域表现不佳。为克服上述局限,我们提出X-PDNet——一个面向平面实例分割与深度估计的多任务学习框架,在以下两方面进行了改进。首先,我们构建跨任务蒸馏设计,通过促进双任务间的早期信息共享实现特定任务的性能提升。其次,我们指出现有方法利用真实边界训练边界回归损失的局限性,并提出一种利用深度信息支持精确边界区域分割的新颖方法。最后,我们从Stanford 2D-3D-Semantics数据集中人工标注了超过3000张图像,并将其公开用于平面实例分割评估。实验证明,所提方法在ScanNet与Stanford 2D-3D-S数据集上的定量结果中展现出显著优势,以较大幅度超越基线方法,验证了方案的有效性。