Recent work has shown that integrating large language models (LLMs) with theorem provers (TPs) in neuro-symbolic pipelines helps with entailment verification and proof-guided refinement of explanations for natural language inference (NLI). However, scaling such refinement to naturalistic NLI remains difficult: long, syntactically rich inputs and deep multi-step arguments amplify autoformalisation errors, where a single local mismatch can invalidate the proof. Moreover, current methods often handle failures via costly global regeneration due to the difficulty of localising the responsible span or step from prover diagnostics. Aiming to address these problems, we propose a decompose-and-formalise framework that (i) decomposes premise-hypothesis pairs into an entailment tree of atomic steps, (ii) verifies the tree bottom-up to isolate failures to specific nodes, and (iii) performs local diagnostic-guided refinement instead of regenerating the whole explanation. Moreover, to improve faithfulness of autoformalisation, we introduce $θ$-substitution in an event-based logical form to enforce consistent argument-role bindings. Across a range of reasoning tasks using five LLM backbones, our method achieves the highest explanation verification rates, improving over the state-of-the-art by 26.2%, 21.7%, 21.6% and 48.9%, while reducing refinement iterations and runtime and preserving strong NLI accuracy.
翻译:近期研究表明,在神经符号化流程中将大型语言模型(LLM)与定理证明器(TP)相结合,有助于自然语言推理(NLI)中的蕴涵验证和基于证明的解释精炼。然而,将此类精炼方法扩展到自然主义NLI仍面临挑战:冗长且句法复杂的输入以及深层的多步论证会放大自动形式化错误,其中单个局部不匹配就可能导致证明失效。此外,由于难以从证明器诊断中定位错误所在的文本片段或推理步骤,现有方法常通过代价高昂的全局重新生成来处理失败案例。针对这些问题,我们提出一种分解与形式化框架,该框架(i)将前提-假设对分解为原子步骤组成的蕴涵树,(ii)自底向上验证该树以将失败隔离至特定节点,(iii)执行基于局部诊断的引导式精炼而非重新生成整个解释。此外,为提高自动形式化的忠实度,我们在基于事件的逻辑形式中引入$θ$-替换机制以强制保持一致的论元角色绑定。在使用五种LLM骨干网络的一系列推理任务中,我们的方法实现了最高的解释验证率,较现有最优方法分别提升26.2%、21.7%、21.6%和48.9%,同时减少了精炼迭代次数与运行时间,并保持了较强的NLI准确性。