Mechanistic models encode scientific knowledge about dynamical systems and are widely used in downstream scientific and policy applications. Recent work has explored LLM-based agentic frameworks to automatically construct mechanistic models from data; however, existing problem settings substantially oversimplify real-world conditions, leaving it unclear whether LLM-generated mechanistic models are reliable in practice. To address this gap, we introduce the Neural-Integrated Mechanistic Modeling (NIMM) evaluation framework, which evaluates LLM-generated mechanistic models under realistic settings with partial observations and diversified task objectives. Our evaluation reveals fundamental challenges in current baselines, ranging from model effectiveness to code-level correctness. Motivated by these findings, we design NIMMgen, an agentic framework for neural-integrated mechanistic modeling that enhances code correctness and practical validity through iterative refinement. Experiments across three datasets from diversified scientific domains demonstrate its strong performance. We also show that the learned mechanistic models support counterfactual intervention simulation.
翻译:机理模型编码了关于动力系统的科学知识,并广泛应用于下游科学与政策决策中。近期研究探索了基于大语言模型的智能体框架,以从数据中自动构建机理模型;然而,现有问题设定过度简化了现实条件,导致大语言模型生成的机理模型在实践中是否可靠尚不明确。为填补这一空白,我们提出了神经集成机理建模评估框架,该框架在部分观测与多样化任务目标的现实场景下评估大语言模型生成的机理模型。我们的评估揭示了当前基线方法在模型有效性至代码级正确性等方面的根本性挑战。基于这些发现,我们设计了NIMMgen——一种用于神经集成机理建模的智能体框架,通过迭代优化提升代码正确性与实践有效性。在跨三个不同科学领域数据集上的实验证明了其优异性能。我们还展示了所学机理模型支持反事实干预模拟。