Computational models have emerged as powerful tools for multi-scale energy modeling research at the building and urban scale, supporting data-driven analysis across building and urban energy systems. However, these models require large amounts of building parameter data that is often inaccessible, expensive to collect, or subject to privacy constraints. We introduce a modular, multimodal generative Artificial Intelligence (AI) framework that integrates image, tabular, and simulation-based components and produces synthetic residential building datasets from publicly available county records and images, and present an end-to-end pipeline instantiating this framework. To reduce typical Large Language Model (LLM) challenges, we evaluate our model's components using occlusion-based visual focus analysis. Our analysis demonstrates that our selected vision-language model achieves significantly stronger visual focus than a GPT-based alternative for building image processing. We also assess realism of our results against a national reference dataset. Our synthetic data overlaps more than 65% with the reference dataset across all evaluated parameters and greater than 90% for three of the four. This work reduces dependence on costly or restricted data sources, lowering barriers to building-scale energy research and Machine Learning (ML)-driven urban energy modeling, and therefore enabling scalable downstream tasks such as energy modeling, retrofit analysis, and urban-scale simulation under data scarcity.
翻译:计算模型已成为在建筑与城市尺度上进行多尺度能耗建模研究的强大工具,支撑着建筑与城市能源系统的数据驱动分析。然而,这类模型需要大量建筑参数数据,而这些数据往往难以获取、采集成本高昂或受隐私约束限制。我们提出了一种模块化的多模态生成式人工智能框架,该框架集成了图像、表格数据和仿真组件,能够利用公开的县级记录和图像生成合成住宅建筑数据集,并给出了实现该框架的端到端流程。为减少典型大语言模型的挑战,我们采用基于遮挡的视觉焦点分析方法评估模型各组件。分析表明,所选视觉-语言模型在建筑图像处理方面展现出显著优于基于GPT替代方案的视觉聚焦能力。我们还通过与全国参考数据集的比对来评估结果的真实性。我们的合成数据在所有评估参数上与参考数据集的重叠度超过65%,其中四个参数中有三个的重叠度超过90%。这项工作减少了对成本高昂或受限数据源的依赖,降低了建筑尺度能耗研究与机器学习驱动城市能耗建模的门槛,从而在数据稀缺条件下实现了可扩展的下游任务,如能耗建模、改造分析和城市尺度模拟。