Select-and-Combine (SAC): A Novel Multi-Stereo Depth Fusion Algorithm for Point Cloud Generation via Efficient Local Markov Netlets

Many practical systems for image-based surface reconstruction employ a stereo/multi-stereo paradigm, due to its ability to scale for large scenes and its ease of implementation for out-of-core operations. In this process, multiple and abundant depth maps from stereo matching must be combined and fused into a single, consistent, and clean point cloud. However, the noises and outliers caused by stereo matching and the heterogenous geometric errors of the poses present a challenge for existing fusion algorithms, since they mostly assume Gaussian errors and predict fused results based on data from local spatial neighborhoods, which may inherit uncertainties from multiple depths resulting in lowered accuracy. In this paper, we propose a novel depth fusion paradigm, that instead of numerically fusing points from multiple depth maps, selects the best depth map per point, and combines them into a single and clean point cloud. This paradigm, called select-and-combine (SAC), is achieved through modeling the point level fusion using local Markov Netlets, a micro-network over point across neighboring views for depth/view selection, followed by a Netlets collapse process for point combination. The Markov Netlets are optimized such that they can inherently leverage spatial consistencies among depth maps of neighboring views, thus they can address errors beyond Gaussian ones. Our experiment results show that our approach outperforms existing depth fusion approaches by increasing the F1 score that considers both accuracy and completeness by 2.07% compared to the best existing method. Finally, our approach generates clearer point clouds that are 18% less redundant while with a higher accuracy before fusion

翻译：许多基于图像的三维表面重建实际系统采用立体/多立体匹配范式，因其在大场景下的可扩展性以及便于实现核外操作的优势。该过程中，来自立体匹配的多个深度图需要被组合融合为单一、一致且洁净的点云。然而，立体匹配产生的噪声与离群点，以及姿态的异质几何误差对现有融合算法构成挑战——它们大多假设高斯误差，并基于局部空间邻域数据预测融合结果，这可能导致继承多深度值的不确定性而降低精度。本文提出一种新颖的深度融合范式：不采用数值融合多个深度图点云的方式，而是为每个点选择最佳深度图，并将其组合为单一洁净点云。该范式称为“选择性组合”（SAC），通过使用局部马尔可夫子网（一种跨邻域视图的点级微观网络）建模点层面融合以实现深度/视图选择，随后通过子网折叠过程完成点组合。通过优化马尔可夫子网，使其能够天然利用邻域视图深度图间的空间一致性，从而处理超出高斯假设的误差。实验结果表明，相较于现有最佳方法，本方法在综合精度与完整性的F1分数上提升2.07%，同时生成的点云冗余度降低18%，且在融合前即具有更高精度。