Formal languages are an integral part of modeling and simulation. They allow the distillation of knowledge into concise simulation models amenable to automatic execution, interpretation, and analysis. However, the arguably most humanly accessible means of expressing models is through natural language, which is not easily interpretable by computers. Here, we evaluate how a Large Language Model (LLM) might be used for formalizing natural language into simulation models. Existing studies only explored using very large LLMs, like the commercial GPT models, without fine-tuning model weights. To close this gap, we show how an open-weights, 7B-parameter Mistral model can be fine-tuned to translate natural language descriptions to reaction network models in a domain-specific language, offering a self-hostable, compute-efficient, and memory efficient alternative. To this end, we develop a synthetic data generator to serve as the basis for fine-tuning and evaluation. Our quantitative evaluation shows that our fine-tuned Mistral model can recover the ground truth simulation model in up to 84.5% of cases. In addition, our small-scale user study demonstrates the model's practical potential for one-time generation as well as interactive modeling in various domains. While promising, in its current form, the fine-tuned small LLM cannot catch up with large LLMs. We conclude that higher-quality training data are required, and expect future small and open-source LLMs to offer new opportunities.
翻译:形式化语言是建模与仿真的核心组成部分。它们能够将知识提炼为简洁的仿真模型,便于自动执行、解释与分析。然而,表达模型最符合人类直觉的方式无疑是自然语言,而计算机却难以直接解析自然语言。本文评估了如何利用大规模语言模型(LLM)将自然语言形式化为仿真模型。现有研究仅探索了使用超大规模LLM(如商业GPT模型)且未对模型权重进行微调。为填补这一空白,我们展示了如何对开源的70亿参数Mistral模型进行微调,使其能够将自然语言描述翻译为领域特定语言中的反应网络模型,从而提供一种可自主部署、计算高效且内存高效的替代方案。为此,我们开发了一个合成数据生成器,作为微调和评估的基础。定量评估表明,我们微调后的Mistral模型在高达84.5%的案例中能够还原出真实仿真模型。此外,我们的小规模用户研究证明了该模型在一次性生成以及跨领域交互式建模方面的实际潜力。尽管前景可观,但目前形式的微调小型LLM仍无法与超大规模LLM相媲美。我们得出结论:需要更高质量的训练数据,并期待未来的小型开源LLM能带来新的机遇。