Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.
翻译:大型语言模型(LLM)已被用于在Lean等证明助手中生成数学定理的形式化证明。然而,我们通常希望根据下游应用的不同需求,针对多种标准对形式化证明进行优化。例如,我们可能希望证明符合特定风格,或具备可读性、简洁性或模块化结构。获得适当优化的证明对于学习任务也至关重要,特别是因为人工编写的证明可能无法满足该目的的最优要求。为此,我们研究了一个新的自动证明优化问题:重写证明,使其在保持正确性的同时,针对任意标准(如长度或可读性)进行优化。作为自动证明优化的首个方法,我们提出了ImProver——一个基于大型语言模型的智能体,能够在Lean中重写证明以优化用户定义的任意度量指标。我们发现直接应用LLM进行证明优化效果有限,因此在ImProver中整合了多项改进,例如在新颖的"状态链"技术中使用符号化的Lean上下文,以及错误修正与检索机制。我们在真实世界的本科数学、竞赛数学及研究级数学定理的重写任务上测试ImProver,结果表明ImProver能够重写证明,使其显著更简短、更具模块化且更易读。