Photovoltaic (PV) power forecasting plays a critical role in power system dispatch and market participation. Because PV generation is highly sensitive to weather conditions and cloud motion, accurate forecasting requires effective modeling of complex spatiotemporal dependencies across multiple information sources. Although recent studies have advanced AI-based forecasting methods, most fail to fuse temporal observations, satellite imagery, and textual weather information in a unified framework. This paper proposes Solar-VLM, a large-language-model-driven framework for multimodal PV power forecasting. First, modality-specific encoders are developed to extract complementary features from heterogeneous inputs. The time-series encoder adopts a patch-based design to capture temporal patterns from multivariate observations at each site. The visual encoder, built upon a Qwen-based vision backbone, extracts cloud-cover information from satellite images. The text encoder distills historical weather characteristics from textual descriptions. Second, to capture spatial dependencies across geographically distributed PV stations, a cross-site feature fusion mechanism is introduced. Specifically, a Graph Learner models inter-station correlations through a graph attention network constructed over a K-nearest-neighbor (KNN) graph, while a cross-site attention module further facilitates adaptive information exchange among sites. Finally, experiments conducted on data from eight PV stations in a northern province of China demonstrate the effectiveness of the proposed framework. Our proposed model is publicly available at https://github.com/rhp413/Solar-VLM.
翻译:光伏发电功率预测在电力系统调度与市场参与中发挥着关键作用。由于光伏发电对气象条件与云层运动高度敏感,准确预测需对多源信息中复杂的时空依赖性进行有效建模。尽管近期研究已推动基于人工智能的预测方法发展,但多数方法未能将时间观测、卫星影像与文本气象信息融合至统一框架。本文提出Solar-VLM——一种大语言模型驱动的多模态光伏发电预测框架。首先,开发模态专用编码器以从异构输入中提取互补特征:时序编码器采用基于分块的设计,从各站点多变量观测中捕获时序模式;视觉编码器基于Qwen视觉骨干网络构建,从卫星影像中提取云覆盖信息;文本编码器则从气象描述文本中提炼历史气象特征。其次,为捕获地理分布光伏电站间的空间依赖关系,引入跨站点特征融合机制:具体而言,图学习器通过基于K近邻(KNN)图构建的图注意力网络建模站点间相关性,跨站点注意力模块则进一步促进站点间的自适应信息交互。最后,基于中国北方某省八个光伏电站数据的实验验证了所提框架的有效性。本模型开源地址为:https://github.com/rhp413/Solar-VLM。