利用LLM生成测试用例验证形式化规范 (Validating Formal Specifications with LLM-generated Test Cases)

Validation is a central activity when developing formal specifications. Similarly to coding, a possible validation technique is to define upfront test cases or scenarios that a future specification should satisfy or not. Unfortunately, specifying such test cases is burdensome and error prone, which could cause users to skip this validation task. This paper reports the results of an empirical evaluation of using pre-trained large language models (LLMs) to automate the generation of test cases from natural language requirements. In particular, we focus on test cases for structural requirements of simple domain models formalized in the Alloy specification language. Our evaluation focuses on the state-of-art GPT-5 model, but results from other closed- and open-source LLMs are also reported. The results show that, in this context, GPT-5 is already quite effective at generating positive (and negative) test cases that are syntactically correct and that satisfy (or not) the given requirement, and that can detect many wrong specifications written by humans.

翻译：验证是开发形式化规范的核心活动。与编码类似，一种可行的验证技术是预先定义测试用例或场景，这些用例或场景应被（或不应被）未来的规范所满足。然而，指定此类测试用例既繁琐又容易出错，可能导致用户跳过此项验证任务。本文报告了一项实证评估的结果，该评估利用预训练大语言模型（LLMs）从自然语言需求自动生成测试用例。我们特别关注针对以Alloy规范语言形式化的简单领域模型的结构性需求所生成的测试用例。我们的评估主要聚焦于当前最先进的GPT-5模型，同时也报告了其他闭源与开源LLM的结果。结果表明，在此背景下，GPT-5已能相当有效地生成语法正确、满足（或不满足）给定需求的正向（及负向）测试用例，并且能够检测出许多由人工编写的错误规范。

相关内容

CASES

关注 4

CASES：International Conference on Compilers, Architectures, and Synthesis for Embedded Systems。 Explanation：嵌入式系统编译器、体系结构和综合国际会议。 Publisher：ACM。 SIT： http://dblp.uni-trier.de/db/conf/cases/index.html

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

163+阅读 · 2019年10月12日