In verification-aware languages, such as Dafny, despite their critical role, specifications are as prone to error as implementations. Flaws in specifications can result in formally verified programs that deviate from the intended behavior. In this paper, we explore the use of mutation testing to reveal weaknesses in formal specifications written in Dafny. We present MutDafny, a tool that increases the reliability of Dafny specifications by automatically signaling potential weaknesses. Using a mutation testing approach, we introduce faults (mutations) into the code and rely on formal specifications for detecting them. If a program with a mutant verifies, this may indicate a weakness in the specification. We extensively analyze mutation operators from popular tools, identifying the ones applicable to Dafny. In addition, we synthesize new operators tailored for the language from bugfix commits in publicly available Dafny projects on GitHub. Drawing from both, we equipped our tool with a total of 40 mutation operators. We evaluate MutDafny's effectiveness and efficiency on a dataset of 794 real-world Dafny programs, and manually analyze a subset of the resulting undetected mutants, identifying five weak real-world specifications (on average, one at every 241 lines of code) that would benefit from strengthening.
翻译:在诸如Dafny等支持形式化验证的语言中,规范虽扮演关键角色,却与实现代码同样易出错。规范中的缺陷可能导致形式化验证通过的程序偏离预期行为。本文探索使用变异测试来揭示Dafny形式规范中的薄弱环节。我们提出MutDafny工具,通过自动标识潜在缺陷来提升Dafny规范的可靠性。采用变异测试方法,我们在代码中注入故障(变异体),并依赖形式化规范对其进行检测。若含有变异体的程序通过验证,则表明规范可能存在缺陷。我们系统分析了主流工具的变异算子,筛选出适用于Dafny的算子集合。此外,通过分析GitHub上公开Dafny项目的缺陷修复提交,我们合成了针对该语言的定制化新算子。综合上述两类来源,我们的工具共配备了40个变异算子。我们在794个真实Dafny程序组成的数据集上评估了MutDafny的有效性与效率,并人工分析了部分未检测出的变异体,最终识别出5个存在缺陷的真实世界规范(平均每241行代码发现一处),这些规范亟待强化。