miniCodeProps: a Minimal Benchmark for Proving Code Properties

Neural networks have shown initial promise in automating mathematical theorem proving in proof assistants such as Lean. The same proof assistants can be used to verify the correctness of code by pairing code with specifications and proofs that the specifications hold. Automating the writing of code, specifications, and proofs could lower the cost of verification, or, ambitiously, enable a machine learning system to output provably correct code. However, it remains unclear whether current neural theorem provers can automatically verify even relatively simple programs. We present miniCodeProps, a benchmark of 177 program specifications in the Lean proof assistant, aimed at the subproblem of automatically generating a proof for a provided program and specification. miniCodeProps contains specifications about simple, self-contained programs (e.g., lists, natural numbers, binary trees) with varied proof difficulty. Despite its simplicity, miniCodeProps is challenging for current LLM-based provers, which succeed in proving about 25 percent of the specifications. We publicly release miniCodeProps as a benchmark for furthering automated theorem proving in the context of formally verified code.

翻译：神经网络在自动化数学定理证明方面已展现出初步潜力，特别是在Lean等证明助手中。同样的证明助手可用于验证代码的正确性，方法是将代码与规范及证明规范成立的证明配对。自动化编写代码、规范和证明可以降低验证成本，或者更雄心勃勃地，使机器学习系统能够输出可证明正确的代码。然而，目前尚不清楚当前的神经定理证明器是否能够自动验证即使是相对简单的程序。我们提出了miniCodeProps，这是一个包含177个Lean证明助手中程序规范的基准，旨在解决为给定程序和规范自动生成证明的子问题。miniCodeProps包含关于简单、自包含程序（例如列表、自然数、二叉树）的规范，其证明难度各不相同。尽管miniCodeProps结构简单，但对当前基于LLM的证明器而言仍具有挑战性，这些证明器仅能成功证明约25%的规范。我们公开发布miniCodeProps作为基准，以推动在形式化验证代码背景下的自动定理证明研究。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日