Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows. Large Language Models (LLMs) provide a promising avenue for automating this process by translating natural-language (NL) operational requirements into executable optimization models via semantic reasoning and code synthesis. Yet existing LLM datasets and benchmarks for optimization modeling primarily target coarse-grained cross-domain generalization, offering limited, rigorous evaluation in power-system settings, particularly for Optimal Power Flow (OPF). We therefore introduce \textbf{ProOPF-D} and \textbf{ProOPF-B}, a dataset and benchmark for professional-grade OPF modeling: ProOPF-D contains 12K instances pairing NL requests with parameter adjustments and structural extensions to a canonical OPF, together with executable implementations; ProOPF-B provides 121 expert-annotated test cases with ground-truth code, enabling end-to-end evaluation under both concrete and abstract OPF modeling regimes.
翻译:可再生能源渗透率的不断提高给电力系统运行带来了显著的不确定性,这要求调度目标与约束条件需频繁调整,从而对依赖专家知识、近乎实时的建模工作流程构成了严峻挑战。大语言模型(LLMs)通过语义推理与代码生成,将自然语言(NL)描述的操作需求转化为可执行的优化模型,为自动化这一过程提供了极具前景的路径。然而,现有用于优化建模的LLM数据集与基准测试主要关注粗粒度的跨领域泛化能力,在电力系统场景下,特别是针对最优潮流(OPF)问题,缺乏严格且充分的评估。为此,我们提出了**ProOPF-D**与**ProOPF-B**,一个面向专业级OPF建模的数据集与基准测试:ProOPF-D包含12K个实例,每个实例将自然语言请求与对标准OPF模型的参数调整及结构扩展配对,并提供可执行的实现代码;ProOPF-B则提供了121个由专家标注的测试用例及其真实代码,支持在具体和抽象两种OPF建模范式下进行端到端的评估。