Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built, to automated testing and debugging of their implementations after they are built. Alloy is a declarative modeling language that is well-suited for verifying system designs. A key strength of Alloy is its scenario-finding toolset, the Analyzer, which allows users to explore all valid scenarios that adhere to the model's constraints up to a user-provided scope. However, even with visualized scenarios, it is difficult to write correct Alloy models. To address this, a growing body of work explores different techniques for debugging Alloy models. In order to develop and evaluate these techniques in an effective manor, this paper presents an empirical study of over 97,000 models written by novice users trying to learn Alloy. We investigate how users write both correct and incorrect models in order to produce a comprehensive benchmark for future use as well as a series of observations to guide debugging and educational efforts for Alloy model development.
翻译:编写声明式模型具有诸多优势,从在系统构建前实现设计级属性的自动化推理与修正,到系统构建后对其实现的自动化测试与调试均涵盖其中。Alloy是一种适用于验证系统设计的声明式建模语言,其核心优势在于配备了场景查找工具集——即分析器,该工具允许用户在用户指定的范围内探索所有符合模型约束的有效场景。然而,即便借助可视化场景,编写正确的Alloy模型仍具挑战性。为此,学界涌现出多种面向Alloy模型调试的技术探索。为有效开发与评估这些技术,本文对97,000余个由试图学习Alloy的新手用户编写的模型开展了实证研究。我们通过分析用户编写正确与错误模型的过程,旨在构建一套可供后续使用的综合性基准测试集,同时提供一系列指导性观察结论,为Alloy模型开发的调试工作与教学实践提供支持。