Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well as indicative attributes like function, quality, and age, is essential for accurate urban analysis, simulations, and policy updates. Current building datasets suffer from incomplete coverage of building multi-attributes. This paper introduces a geospatial artificial intelligence (GeoAI) framework for large-scale building modeling, presenting the first national-scale Multi-Attribute Building dataset (CMAB), covering 3,667 spatial cities, 29 million buildings, and 21.3 billion square meters of rooftops with an F1-Score of 89.93% in OCRNet-based extraction, totaling 337.7 billion cubic meters of building stock. We trained bootstrap aggregated XGBoost models with city administrative classifications, incorporating features such as morphology, location, and function. Using multi-source data, including billions of high-resolution Google Earth images and 60 million street view images (SVIs), we generated rooftop, height, function, age, and quality attributes for each building. Accuracy was validated through model benchmarks, existing similar products, and manual SVI validation, mostly above 80%. Our dataset and results are crucial for global SDGs and urban planning.
翻译:快速获取三维建筑数据,包括屋顶、高度与朝向等几何属性,以及功能、质量与年代等指示性属性,对于精准的城市分析、模拟与政策更新至关重要。现有建筑数据集普遍存在多属性覆盖不完整的问题。本文提出了一种用于大规模建筑建模的地理人工智能框架,并发布了首个国家级多属性建筑数据集,该数据集覆盖3,667个空间城市、2,900万栋建筑及213亿平方米的屋顶面积,基于OCRNet提取的F1分数达89.93%,建筑总体积总计3,377亿立方米。我们采用城市行政分类训练了自助聚合XGBoost模型,整合了形态、位置与功能等特征。利用包括数十亿张高分辨率谷歌地球图像与6,000万张街景图像在内的多源数据,我们为每栋建筑生成了屋顶、高度、功能、年代与质量属性。通过模型基准测试、现有同类产品对比及人工街景图像验证,数据精度大多超过80%。本数据集及其成果对全球可持续发展目标与城市规划具有重要意义。