Formal verification of large C programs is impeded by state-space explosion: Bounded Model Checking (BMC) tools must encode the entire state space up to the predetermined bound by unrolling all nested constructs. We present ConVer, a top-down compositional verification tool. Given a C program with a top-level assertion, ConVer decomposes verification top-down: it uses a large language model (LLM) to synthesise function contracts from the system property, then alternates system-level and function-level checks in a CEGAR-CEGIS loop, refining contracts whenever a check fails via SMART ICE learning. We evaluate ConVer on four benchmark suites of increasing difficulty and against other state-of-the-art (SOTA) tools. On the Frama-C benchmark of 45 simple C programs, ConVer achieves 82-96% verification success across three LLM backends, with 93-95% of converged programs requiring only a single CEGAR-CEGIS iteration. On the X.509 parser benchmark (6~programs) and LF2C-Simple suite (17 programs), ConVer achieves 33-50% and 82-88% success respectively. On the VerifyThis suite of 11 recursive and loop-intensive programs, the Pre-Abstraction strategy achieves 55-64% success. In addition, we present ESBMC-LF a preprocessor tool that converts LF models to C while preserving the properties of the LF files, enabling ConVer to verify them. We transpile the LF Verifier Benchmarks using ESBMC-LF to C; we denote those LF-Hard. We show that ConVer successfully verifies 67% of LF-Hard benchmarks overall.
翻译:大规模C程序的形式化验证受限于状态空间爆炸问题:有界模型检测(BMC)工具需通过展开所有嵌套结构,将整个状态空间编码至预定边界。本文提出ConVer——一种自顶向下的组合式验证工具。对于带有顶层断言的C程序,ConVer采用自顶向下的方式分解验证:利用大语言模型(LLM)从系统属性中合成函数合约,随后在CEGAR-CEGIS循环中交替执行系统级与函数级检查,并通过SMART ICE学习方法在检查失败时精炼合约。我们在四个难度递增的基准套件上对ConVer进行评估,并与当前最先进(SOTA)工具进行对比。在包含45个简单C程序的Frama-C基准上,ConVer在三种LLM后端下实现了82-96%的验证成功率,其中93-95%的收敛程序仅需单次CEGAR-CEGIS迭代。在X.509解析器基准(6个程序)和LF2C-Simple套件(17个程序)中,ConVer分别达到33-50%和82-88%的成功率。在包含11个递归与循环密集型程序的VerifyThis套件中,预抽象策略实现了55-64%的成功率。此外,我们提出ESBMC-LF预处理器工具,该工具将LF模型转换为C语言并保持LF文件的属性,从而使ConVer能够验证它们。我们利用ESBMC-LF将LF验证器基准转译为C语言,并命名为LF-Hard。实验表明,ConVer在整体上成功验证了67%的LF-Hard基准。