We report on using an agentic coding assistant (Claude Code, powered by Claude Opus 4.6) to mechanize a substantial Rocq correctness proof from scratch, with human guidance but without human proof writing. The proof establishes semantic preservation for the administrative normal form (ANF) transformation in the CertiCoq verified compiler for Rocq. The closely related continuation-passing style (CPS) transformation in CertiCoq was previously proved correct by human experts over several months. We use this proof as a template and instruct the LLM to adapt the proof technique to the ANF setting, which differs in important technical ways. The resulting ANF proof comprises approximately 7,800 lines of Rocq (larger than the 5,300-line CPS proof) and was developed in approximately 96 hours. We describe the proof technique and report on the experience of developing it with an LLM, discussing both the strengths and limitations of the approach and its implications for verified compiler construction.
翻译:我们报告了使用智能编码助手(Claude Code,基于Claude Opus 4.6构建)在人工指导下(但无需人工撰写证明)从头开始机械化构建一个大规模Rocq正确性证明的过程。该证明确立了CertiCoq验证编译器中管理范式(ANF)变换的语义保持性。CertiCoq中密切相关的延续传递风格(CPS)变换先前由人类专家耗时数月完成正确性证明。我们以此证明为模板,指导大语言模型将证明技术适配到存在重要技术差异的ANF场景。最终生成的ANF证明包含约7,800行Rocq代码(超过CPS证明的5,300行),开发耗时约96小时。我们详细阐述了该证明技术,并报告了使用大语言模型开发该证明的经验,同时讨论了该方法的优势与局限性及其对验证编译器构建的启示。