Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.
翻译:证明自动形式化旨在将自然语言编写的非形式化数学证明转换为Lean~4等形式语言中的形式化证明。已有若干研究开发了基于LLM的证明自动形式化模型。然而,现有评估通常聚焦于从精心整理的数据集中翻译结构良好的非形式化证明。我们认为,一个鲁棒的证明自动形式化器必须对偏离这些理想化情形的不规范非形式化证明也能保持忠实性,并首次开展了证明自动形式化模型鲁棒性研究。我们定义了两类扰动并评估其鲁棒性:全局扰动以不同风格改写非形式化证明,在此扰动下形式化结果应保持一致性;局部扰动改变数值、符号或证明步骤(可能以反事实方式),鲁棒的形式化应忠实反映该扰动,而非自行回退至原始结论或推断出其他结果。我们构建了基于miniF2F和MATH-500的扰动基准测试,自动衡量证明自动形式化在全局扰动下正确性的稳定性,以及其输出反映局部扰动的忠实性。评估了七种近期模型,发现所有模型均对全局扰动敏感,且大部分无法在局部扰动下保持忠实性。代码与数据可通过https://github.com/ucr-rai/robust-proof-autoformalization获取。