Over the past decade, different domain-specific languages (DSLs) were proposed to formally specify requirements stated in legal contracts, mainly for analysis but also for code generation. Symboleo is a promising language in that area. However, writing formal specifications from natural-language contracts is a complex task, especial for legal experts who do not have formal language expertise. This paper reports on an exploratory experiment targeting the automated generation of Symboleo specifications from business contracts in English using Large Language Models (LLMs). Combinations (38) of prompt components are investigated (with/without the grammar, semantics explanations, 0 to 3 examples, and emotional prompts), mainly on GPT-4o but also to a lesser extent on 4 other LLMs. The generated specifications are manually assessed against 16 error types grouped into 3 severity levels. Early results on all LLMs show promising outcomes (even for a little-known DSL) that will likely accelerate the specification of legal contracts. However, several observed issues, especially around grammar/syntax adherence and environment variable identification (49%), suggest many areas where potential improvements should be investigated.
翻译:过去十年间,学界提出了多种领域特定语言(DSL),用于形式化地规约法律合同中的需求条款,主要用于分析目的,也可用于代码生成。Symboleo是该领域一种前景广阔的语言。然而,从自然语言合同编写形式化规范是一项复杂的任务,尤其对缺乏形式化语言专业知识的法律专家而言。本文报告了一项探索性实验,旨在利用大语言模型(LLM)从英文商业合同自动生成Symboleo规范。研究考察了38种提示组件组合(包含/不包含语法与语义说明、0至3个示例、情感提示),主要基于GPT-4o模型,同时也在其他4种大语言模型上进行了小规模测试。生成的规范根据16种错误类型(按严重程度分为3级)进行人工评估。所有大语言模型的早期实验结果均显示出良好前景(即使对于知名度较低的DSL),这有望加速法律合同的规范化进程。然而,研究也发现若干问题,特别是在语法/句法遵循度与环境变量识别(49%)方面,表明该领域仍存在许多值得深入探索的改进空间。