Stakeholders often describe system requirements using natural language which are then converted to formal syntax by a domain-expert leading to increased design costs. This paper assesses the capabilities of Large Language Models (LLMs) in converting between natural language descriptions and formal specifications. Existing work has evaluated the capabilities of LLMs in generating formal syntax such as source code but such experiments are typically hand-crafted and use problems that are likely to be in the training set of LLMs, and often require human-annotated datasets. We propose an approach that can use two copies of an LLM in conjunction with an off-the-shelf verifier to automatically evaluate its translation abilities without any additional human input. Our approach generates formal syntax using language grammars to automatically generate a dataset. We conduct an empirical evaluation to measure the accuracy of this translation task and show that SOTA LLMs cannot adequately solve this task, limiting their current utility in the design of complex systems.
翻译:利益相关者通常使用自然语言描述系统需求,随后由领域专家转换为形式化语法,这导致设计成本增加。本文评估了大语言模型在自然语言描述与形式化规约之间进行转换的能力。现有工作已评估过LLM生成源代码等形式化语法的能力,但这类实验通常为人工构建,使用的往往是可能存在于LLM训练集中的问题,且常需人工标注数据集。我们提出一种方法,可结合两个LLM副本与现成的验证器,在无需额外人工输入的情况下自动评估其翻译能力。该方法利用语言语法生成形式化语法以自动构建数据集。我们通过实证评估衡量该翻译任务的准确率,结果表明当前最先进的LLM无法充分解决该任务,限制了其在复杂系统设计中的实际应用价值。