Incremental Semantics-Aided Meshing from LiDAR-Inertial Odometry and RGB Direct Label Transfer

Geometric high-fidelity mesh reconstruction from LiDAR-inertial scans remains challenging in large, complex indoor environments -- such as cultural buildings -- where point cloud sparsity, geometric drift, and fixed fusion parameters produce holes, over-smoothing, and spurious surfaces at structural boundaries. We propose a modular, incremental RGB+LiDAR pipeline that generates incremental semantics-aided high-quality meshes from indoor scans through scan frame-based direct label transfer. A vision foundation model labels each incoming RGB frame; labels are incrementally projected and fused onto a LiDAR-inertial odometry map; and an incremental semantics-aware Truncated Signed Distance Function (TSDF) fusion step produces the final mesh via marching cubes. This frame-level fusion strategy preserves the geometric fidelity of LiDAR while leveraging rich visual semantics to resolve geometric ambiguities at reconstruction boundaries caused by LiDAR point-cloud sparsity and geometric drift. We demonstrate that semantic guidance improves geometric reconstruction quality; quantitative evaluation is therefore performed using geometric metrics on the Oxford Spires dataset, while results from the NTU VIRAL dataset are analyzed qualitatively. The proposed method outperforms state-of-the-art geometric baselines ImMesh and Voxblox, demonstrating the benefit of semantics-aided fusion for geometric mesh quality. The resulting semantically labelled meshes are of value when reconstructing Universal Scene Description (USD) assets, offering a path from indoor LiDAR scanning to XR and digital modeling.

翻译：从激光雷达-惯性扫描中进行几何高保真网格重建在大型复杂室内环境（如文化建筑）中仍具挑战性：点云稀疏性、几何漂移及固定融合参数会导致结构边界出现孔洞、过度平滑及虚假表面。我们提出一种模块化增量RGB+激光雷达流水线，通过基于扫描帧的直接标签传递从室内扫描生成增量语义辅助的高质量网格。视觉基础模型为每帧输入的RGB图像标注标签；标签被增量投影并融合到激光雷达-惯性里程计地图上；最终通过增量语义感知的截断符号距离函数（TSDF）融合步骤，利用行进立方体算法生成最终网格。这种帧级融合策略在保持激光雷达几何保真度的同时，利用丰富的视觉语义解决了由激光雷达点云稀疏性和几何漂移导致的重建边界几何歧义问题。我们证明语义引导能够提升几何重建质量，因此在牛津Spires数据集上采用几何指标进行定量评估，而对南洋理工大学VIRAL数据集的结果则进行定性分析。所提方法优于当前最先进的几何基线ImMesh和Voxblox，展示了语义辅助融合对几何网格质量的提升效果。生成的同时带有语义标签的网格在重建通用场景描述（USD）资产时具有价值，为从室内激光雷达到扩展现实和数字建模提供了一条可行路径。