Agentic Scientific Simulation: Execution-Grounded Model Construction and Reconstruction

LLM agents are increasingly used for code generation, but physics-based simulation poses a deeper challenge: natural-language descriptions of simulation models are inherently underspecified, and different admissible resolutions of implicit choices produce physically valid but scientifically distinct configurations. Without explicit detection and resolution of these ambiguities, neither the correctness of the result nor its reproducibility from the original description can be assured. This paper investigates agentic scientific simulation, where model construction is organized as an execution-grounded interpret-act-validate loop and the simulator serves as the authoritative arbiter of physical validity rather than merely a runtime. We present JutulGPT, a reference implementation built on the fully differentiable Julia-based reservoir simulator JutulDarcy. The agent combines structured retrieval of documentation and examples with code synthesis, static analysis, execution, and systematic interpretation of solver diagnostics. Underspecified modelling choices are detected explicitly and resolved either autonomously (with logged assumptions) or through targeted user queries. The results demonstrate that agent-mediated model construction can be grounded in simulator validation, while also revealing a structural limitation: choices resolved tacitly through simulator defaults are invisible to the assumption log and to any downstream representation. A secondary experiment with autonomous reconstruction of a reference model from progressively abstract textual descriptions shows that reconstruction variability exposes latent degrees of freedom in simulation descriptions and provides a practical methodology for auditing reproducibility. All code, prompts, and agent logs are publicly available.

翻译：大语言模型智能体在代码生成中的应用日益广泛，但基于物理的模拟提出了更深层次的挑战：模拟模型的自然语言描述本质上是不完备的，而对隐含选择的不同可行解析方案会产生物理有效但科学上不同的配置。若不能明确检测并解决这些歧义，既无法保证结果的正确性，也无法确保其能够从原始描述中复现。本文研究智能科学模拟，将模型构建组织为基于执行的“解释-执行-验证”循环，并使模拟器充当物理有效性的权威仲裁者，而非仅仅是运行时环境。我们提出了JutulGPT——一个基于完全可微分的Julia储层模拟器JutulDarcy构建的参考实现。该智能体结合了文档与示例的结构化检索、代码合成、静态分析、执行以及求解器诊断信息的系统化解释。不完备的建模选择会被明确检测，并通过自主方式（记录假设）或针对性用户查询予以解决。结果表明，智能体介导的模型构建可以基于模拟器验证进行落地，同时也揭示了一个结构性局限：通过模拟器默认设置隐性解决的选择对于假设日志及任何下游表示均不可见。一项从逐步抽象的文本描述自主重构参考模型的二次实验表明，重构的可变性揭示了模拟描述中潜在的自由度，并为审计可复现性提供了实用方法论。所有代码、提示词及智能体日志均已公开。