Neural networks have shown initial promise in automating mathematical theorem proving in proof assistants such as Lean. The same proof assistants can be used to verify the correctness of code by pairing code with specifications and proofs that the specifications hold. Automating the writing of code, specifications, and proofs could lower the cost of verification, or, ambitiously, enable a machine learning system to output provably correct code. However, it remains unclear whether current neural theorem provers can automatically verify even relatively simple programs. We present miniCodeProps, a benchmark of 177 program specifications in the Lean proof assistant, aimed at the subproblem of automatically generating a proof for a provided program and specification. miniCodeProps contains specifications about simple, self-contained programs (e.g., lists, natural numbers, binary trees) with varied proof difficulty. Despite its simplicity, miniCodeProps is challenging for current LLM-based provers, which succeed in proving about 25 percent of the specifications. We publicly release miniCodeProps as a benchmark for furthering automated theorem proving in the context of formally verified code.
翻译:神经网络在自动化数学定理证明方面已展现出初步潜力,特别是在Lean等证明助手中。同样的证明助手可用于验证代码的正确性,方法是将代码与规范及证明规范成立的证明配对。自动化编写代码、规范和证明可以降低验证成本,或者更雄心勃勃地,使机器学习系统能够输出可证明正确的代码。然而,目前尚不清楚当前的神经定理证明器是否能够自动验证即使是相对简单的程序。我们提出了miniCodeProps,这是一个包含177个Lean证明助手中程序规范的基准,旨在解决为给定程序和规范自动生成证明的子问题。miniCodeProps包含关于简单、自包含程序(例如列表、自然数、二叉树)的规范,其证明难度各不相同。尽管miniCodeProps结构简单,但对当前基于LLM的证明器而言仍具有挑战性,这些证明器仅能成功证明约25%的规范。我们公开发布miniCodeProps作为基准,以推动在形式化验证代码背景下的自动定理证明研究。