Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming

Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*. Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker. Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.

翻译：面向证明的程序将计算内容与程序正确性的证明相结合。然而，尽管在F*等语言中使用可满足性模理论（SMT）求解器来自动化证明，编程和证明所需的人力仍然巨大。为了推动利用人工智能自动化构建面向证明的程序的研究，我们整理了一个包含60万行开源F*程序及证明的数据集，其中涵盖了从Windows、Linux到Python和Firefox等生产系统中使用的软件。该数据集包含约3.2万个顶层F*定义，每个定义代表一个类型导向的程序与证明合成问题——即根据以F*类型表示的形式规范生成定义。我们提供了一个程序片段检查器，可查询F*以验证候选方案的正确性。我们相信这是最大的SMT辅助程序证明语料库，并配有可复现的程序片段检查器。基于该数据集，我们探索了利用人工智能在F*中合成程序及其证明的方法，并取得了有前景的结果。主要发现是：微调后的较小语言模型（如Phi-2或StarCoder）的性能可与大型语言模型（如GPT-4）相媲美，且计算成本显著更低。我们还识别了多种基于类型的检索增强技术，发现它们能大幅提升性能。通过详细的错误分析和案例研究，我们揭示了模型与技术的潜在优劣，并为未来改进指明了方向。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日