Accurate heat-demand maps play a crucial role in decarbonizing space heating, yet most municipalities lack detailed building-level data needed to calculate them. We introduce HeatPrompt, a zero-shot vision-language energy modeling framework that estimates annual heat demand using semantic features extracted from satellite images, basic Geographic Information System (GIS), and building-level features. We feed pretrained Large Vision Language Models (VLMs) with a domain-specific prompt to act as an energy planner and extract the visual attributes such as roof age, building density, etc, from the RGB satellite image that correspond to the thermal load. A Multi-Layer Perceptron (MLP) regressor trained on these captions shows an $R^2$ uplift of 93.7% and shrinks the mean absolute error (MAE) by 30% compared to the baseline model. Qualitative analysis shows that high-impact tokens align with high-demand zones, offering lightweight support for heat planning in data-scarce regions.
翻译:精确的热需求地图在空间供暖脱碳中起着关键作用,然而大多数市政当局缺乏计算所需详细的建筑级数据。我们提出了HeatPrompt,一种零样本视觉语言能源建模框架,它利用从卫星图像、基础地理信息系统(GIS)和建筑级特征中提取的语义特征来估算年度热需求。我们向预训练的大型视觉语言模型(VLMs)输入特定领域提示,使其扮演能源规划师的角色,并从RGB卫星图像中提取与热负荷相对应的视觉属性,如屋顶年代、建筑密度等。基于这些描述训练的MLP回归器显示,与基线模型相比,其$R^2$提升了93.7%,并将平均绝对误差(MAE)降低了30%。定性分析表明,高影响力词汇与高需求区域相符,为数据稀缺地区的热规划提供了轻量级支持。