City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images

from arxiv, Accepted to the USM3D Workshop Proceedings at the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 as an Oral Presentation. Project page: https://citymesh3r.github.io/

City-scale 3D surface reconstruction from multiview images for downstream 3D simulation, poses highly challenging problems due to the scale and complexity of urban scenes. Existing city-scale 3D reconstruction methods based on NeRF, Gaussian Splatting etc. often fail to recover 3D meshes ready for simulation due to incomplete/missing geometry and irregular, noisy surfaces. Scaling existing small-scale 3D reconstruction methods to arbitrarily large urban scenes is highly infeasible due to their computational complexity. We present City-Mesh3R, a scalable framework for reconstructing watertight surface meshes directly from large unordered image collections. Unlike recent methods which use global sparse SfM point-cloud initialization followed by a distributed 3D dense reconstruction of large-scale scenes, our method follows an end-to-end images-to-mesh 3D reconstruction approach using a divide-and-conquer strategy. The sparse city map is reconstructed via topological image clustering, cluster-wise independent sparse SfM and map merging, without need for exhaustive image feature matching. Then this map is partitioned spatially to perform geometry-aware camera selection, followed by dense surface reconstruction and surface refinement using curvature-aware adaptive vertex density remeshing. These partition meshes are then stitched together to produce the global mesh of the city. The proposed end-to-end framework is evaluated on city-scale reconstruction datasets. As demonstrated by our qualitative and quantitative results, our proposed method yields high-fidelity watertight 3D meshes with regular geometry, capturing fine surface details, and is suitable for scaling to arbitrarily large scenes owing to the end-to-end processing in a distributed setting.

翻译：从多视角图像进行城市尺度三维表面重建以支持下游三维仿真，因城市场景的规模与复杂性而极具挑战性。基于NeRF、高斯泼溅（Gaussian Splatting）等技术的现有城市尺度三维重建方法，常因几何结构不完整/缺失及表面不规则、含噪声而无法恢复适用于仿真的三维网格。将现有小尺度三维重建方法扩展至任意大型城市场景，因其计算复杂度而高度不可行。我们提出City-Mesh3R——一种可扩展框架，可直接从大规模无序图像集合中重建水密性表面网格。与近期方法采用全局稀疏SfM点云初始化后分布式三维稠密重建大尺度场景不同，本文方法采用分治策略实现端到端“图像-网格”三维重建范式。通过拓扑图像聚类、聚类独立稀疏SfM与地图融合重建稀疏城市地图，无需穷举图像特征匹配；继而对该地图进行空间划分以实现几何感知相机选择，随后进行稠密表面重建，并利用曲率感知自适应顶点密度网格重网格化进行表面优化。最终将各分区网格拼接为城市全局网格。所提出的端到端框架在城市尺度重建数据集上完成评估。定量与定性结果表明，该方法能生成具有规则几何结构、捕获精细表面细节的高保真水密性三维网格，且因其分布式环境下的端到端处理特性，具备向任意大规模场景扩展的适用性。