In this paper, we introduce Segmentation-Driven Deformation Multi-View Stereo (SD-MVS), a method that can effectively tackle challenges in 3D reconstruction of textureless areas. We are the first to adopt the Segment Anything Model (SAM) to distinguish semantic instances in scenes and further leverage these constraints for pixelwise patch deformation on both matching cost and propagation. Concurrently, we propose a unique refinement strategy that combines spherical coordinates and gradient descent on normals and pixelwise search interval on depths, significantly improving the completeness of reconstructed 3D model. Furthermore, we adopt the Expectation-Maximization (EM) algorithm to alternately optimize the aggregate matching cost and hyperparameters, effectively mitigating the problem of parameters being excessively dependent on empirical tuning. Evaluations on the ETH3D high-resolution multi-view stereo benchmark and the Tanks and Temples dataset demonstrate that our method can achieve state-of-the-art results with less time consumption.
翻译:摘要:本文提出了一种基于分割驱动的变形多视图立体视觉方法(SD-MVS),该方法能有效应对无纹理区域三维重建中的挑战。我们首次采用Segment Anything Model(SAM)对场景中的语义实例进行区分,并进一步利用这些约束在匹配代价和传播阶段实现像素级面片变形。同时,我们提出了一种独特的精化策略,该策略结合了法线方向的球面坐标与梯度下降法以及深度方向的像素级搜索区间,显著提升了重建三维模型的完整性。此外,我们采用期望最大化(EM)算法交替优化聚合匹配代价与超参数,有效缓解了参数过度依赖经验调节的问题。在ETH3D高分辨率多视图立体基准和Tanks and Temples数据集上的评估表明,我们的方法能在更短的时间内取得最先进的成果。