The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by \emph{faithfulness}: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce \emph{Bidirectional Provability Fingerprinting} (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) \emph{Counterfactual Probe Generation} (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the \emph{Equivalence Spectrum}, a continuous faithfulness score that replaces brittle binary verdicts; (iii) \emph{Adaptive Probe Budget Allocation} (\apba{}), an information-theoretic budget router; and (iv) \emph{Faithfulness-Guided Decoding} (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a \emph{drift detection theorem} and a \emph{PAC-faithfulness} result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/δ)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

翻译：自动形式化——将自然语言数学转化为形式证明助手——的瓶颈并非翻译流畅性，而是**忠实性**：一个形式陈述可能类型检查通过且可证明，却编码了与源自然语言意图不同的定理。我们提出**双向可证明性指纹识别**（BPF），该框架通过每个候选者在背景理论中的前向与后向推论邻域来表征其特性，并与从自然语言陈述导出的探针进行匹配，从而认证忠实性。我们进一步引入四个新组件：(i) **反事实探针生成**（CPG），一种对比式过程，可合成针对特定偏移方向的探针；(ii) **等价性谱**，一种连续的忠实性评分，取代脆弱的二元判定；(iii) **自适应探针预算分配**（APBA），一种基于信息论的预算路由器；以及(iv) **忠实性引导解码**（FGD），在自动形式化过程中使用BPF信号作为奖励。我们证明了**偏移检测定理**和**PAC忠实性**结果，表明在温和假设下，自然语言陈述的等价类可从O(log(1/δ)/ε)个探针中学习得到。我们发布**DriftBench**基准测试集，包含2,183个NL/Lean 4配对，覆盖mathlib4六个子领域并带有受控偏移标签。BPF+CPG在3.0%假阳性率下检测出89.6%的偏移形式化(相比之下，类型检查基线和LLM评判基线的检测率分别为41.2%和63.3%)，而FGD将最先进自动形式化器输出偏移陈述的比率降低了47%。https://pmlrbd.github.io/BPF/