Accurate three-dimensional urban data are critical for climate modelling, disaster risk assessment, and urban planning, yet remain scarce due to reliance on proprietary sensors or poor cross-city generalisation. We propose GeoFormer, an open-source Swin Transformer framework that jointly estimates building height (BH) and footprint (BF) on a 100 m grid using only Sentinel-1/2 imagery and open DEM data. A geo-blocked splitting strategy ensures strict spatial independence between training and test sets. Evaluated over 54 diverse cities, GeoFormer achieves a BH RMSE of 3.19 m and a BF RMSE of 0.05, improving 7.5% and 15.3% over the strongest CNN baseline, while maintaining under 3.5 m BH RMSE in cross-continent transfer. Ablation studies confirm that DEM is indispensable for height estimation and that optical reflectance dominates over SAR, though multi-source fusion yields the best overall accuracy. All code, weights, and global products are publicly released.
翻译:精确的三维城市数据对于气候建模、灾害风险评估和城市规划至关重要,但由于依赖专有传感器或跨城市泛化能力差,此类数据仍然稀缺。我们提出了GeoFormer,一个开源的Swin Transformer框架,仅使用Sentinel-1/2影像和开放的DEM数据,即可在100米网格上联合估算建筑高度(BH)与基底面积(BF)。采用地理分块划分策略确保了训练集与测试集之间严格的空间独立性。在54个多样化城市上的评估表明,GeoFormer实现了BH RMSE 3.19米和BF RMSE 0.05,相比最强的CNN基线分别提升了7.5%和15.3%,同时在跨大陆迁移中保持BH RMSE低于3.5米。消融研究证实,DEM对于高度估计不可或缺,且光学反射率数据优于SAR数据,但多源融合能获得最佳整体精度。所有代码、权重及全球产品均已公开发布。