Surrogate models are used to predict the behavior of complex energy systems that are too expensive to simulate with traditional numerical methods. Our work introduces the use of language descriptions, which we call "system captions" or SysCaps, to interface with such surrogates. We argue that interacting with surrogates through text, particularly natural language, makes these models more accessible for both experts and non-experts. We introduce a lightweight multimodal text and timeseries regression model and a training pipeline that uses large language models (LLMs) to synthesize high-quality captions from simulation metadata. Our experiments on two real-world simulators of buildings and wind farms show that our SysCaps-augmented surrogates have better accuracy on held-out systems than traditional methods while enjoying new generalization abilities, such as handling semantically related descriptions of the same test system. Additional experiments also highlight the potential of SysCaps to unlock language-driven design space exploration and to regularize training through prompt augmentation.
翻译:代理模型用于预测复杂能源系统的行为,这些系统若采用传统数值方法进行仿真则成本过高。本研究引入了一种称为“系统描述”(SysCaps)的语言描述方法,作为与此类代理模型的交互接口。我们认为,通过文本(尤其是自然语言)与代理模型进行交互,能够使专家和非专家用户都更便捷地使用这些模型。我们提出了一种轻量级的多模态文本与时间序列回归模型,以及一个利用大语言模型(LLMs)从仿真元数据中合成高质量描述的训练流程。通过在建筑与风电场两个实际系统的仿真器上进行实验,结果表明:与传统方法相比,我们基于SysCaps增强的代理模型在未见系统上具有更高的预测精度,同时展现出新的泛化能力,例如能够处理对同一测试系统在语义上相关的不同描述。进一步的实验还表明,SysCaps具有推动语言驱动的设计空间探索以及通过提示增强实现训练正则化的潜力。