Modelica is a widely adopted language for simulating complex physical systems, yet effective model creation and optimization require substantial domain expertise. Although large language models (LLMs) have demonstrated promising capabilities in code generation, their application to modeling remains largely unexplored. To address this gap, we have developed benchmark datasets specifically designed to evaluate the performance of LLMs in generating Modelica component models and test cases. Our evaluation reveals substantial limitations in current LLMs, as the generated code often fails to simulate successfully. To overcome these challenges, we propose a specialized workflow that integrates supervised fine-tuning, graph retrieval-augmented generation, and feedback optimization to improve the accuracy and reliability of Modelica code generation. The evaluation results demonstrate significant performance gains: the maximum improvement in pass@1 reached 0.3349 for the component generation task and 0.2457 for the test case generation task. This research underscores the potential of LLMs to advance intelligent modeling tools and offers valuable insights for future developments in system modeling and engineering applications.
翻译:Modelica是一种广泛用于模拟复杂物理系统的语言,然而有效的模型创建与优化需要深厚的领域专业知识。尽管大语言模型在代码生成方面已展现出潜力,但其在建模领域的应用仍鲜有探索。为填补这一空白,我们开发了专门设计的基准数据集,用于评估大语言模型在生成Modelica组件模型和测试用例方面的性能。评估结果显示当前大语言模型存在显著局限,生成的代码常无法成功模拟。为应对这些挑战,我们提出了一种集成监督微调、图检索增强生成与反馈优化的专用工作流,以提升Modelica代码生成的准确性与可靠性。评估结果表明性能获得显著提升:在组件生成任务中pass@1最大提升达0.3349,在测试用例生成任务中达0.2457。本研究揭示了大语言模型推动智能建模工具发展的潜力,并为系统建模与工程应用的未来发展提供了重要见解。