With the emergence of Gaussian Splats, recent efforts have focused on large-scale scene geometric reconstruction. However, most of these efforts either concentrate on memory reduction or spatial space division, neglecting information in the semantic space. In this paper, we propose a novel method, named SA-GS, for fine-grained 3D geometry reconstruction using semantic-aware 3D Gaussian Splats. Specifically, we leverage prior information stored in large vision models such as SAM and DINO to generate semantic masks. We then introduce a geometric complexity measurement function to serve as soft regularization, guiding the shape of each Gaussian Splat within specific semantic areas. Additionally, we present a method that estimates the expected number of Gaussian Splats in different semantic areas, effectively providing a lower bound for Gaussian Splats in these areas. Subsequently, we extract the point cloud using a novel probability density-based extraction method, transforming Gaussian Splats into a point cloud crucial for downstream tasks. Our method also offers the potential for detailed semantic inquiries while maintaining high image-based reconstruction results. We provide extensive experiments on publicly available large-scale scene reconstruction datasets with highly accurate point clouds as ground truth and our novel dataset. Our results demonstrate the superiority of our method over current state-of-the-art Gaussian Splats reconstruction methods by a significant margin in terms of geometric-based measurement metrics. Code and additional results will soon be available on our project page.
翻译:随着高斯溅射技术的出现,近期研究集中于大规模场景的几何重建。然而,这些工作大多聚焦于内存优化或空间划分,忽略了语义空间中的信息。本文提出一种名为SA-GS的新方法,利用语义感知的3D高斯溅射进行细粒度三维几何重建。具体而言,我们利用SAM和DINO等大型视觉模型中存储的先验信息生成语义掩码。随后引入几何复杂度测量函数作为软正则化项,指导特定语义区域内每个高斯溅射的形状。此外,我们提出一种估计不同语义区域预期高斯溅射数量的方法,有效为这些区域的高斯溅射数量提供下界。接着,我们采用基于概率密度的新型提取方法获取点云,将高斯溅射转化为对下游任务至关重要的点云数据。本方法在保持基于图像的高质量重建结果的同时,还具备进行精细语义查询的潜力。我们在公开的大规模场景重建数据集(包含高精度点云真值)及自建数据集上进行了广泛实验。结果表明,在基于几何的评估指标上,本方法显著优于当前最先进的高斯溅射重建方法。代码及补充结果即将发布于项目主页。