Contemporary language models enable new opportunities for structured reasoning with text, such as the construction and evaluation of intuitive, proof-like textual entailment trees without relying on brittle formal logic. However, progress in this direction has been hampered by a long-standing lack of a clear protocol for determining what valid compositional entailment is. This absence causes noisy datasets and limited performance gains by modern neuro-symbolic engines. To address these problems, we formulate a consistent and theoretically grounded approach to annotating decompositional entailment datasets, and evaluate its impact on LLM-based textual inference. We find that our resulting dataset, RDTE (Recognizing Decompositional Textual Entailment), has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets, suggesting that RDTE is a significant step forward in the long-standing problem of forming a clear protocol for discerning entailment. We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in a modern neuro-symbolic reasoning engine significantly improves results (both accuracy and proof quality) over other entailment classifier baselines, illustrating the practical benefit of this advance for textual inference.
翻译:当代语言模型为文本的结构化推理开辟了新机遇,例如无需依赖脆弱的符号逻辑即可构建和评估直观的、类似证明的文本蕴含树。然而,由于长期缺乏明确协议来确定何为有效的组合蕴含,这一方向的进展受到阻碍。这一缺失导致了数据集的噪声问题以及现代神经符号推理引擎性能提升有限。为解决这些问题,我们提出了一致且理论坚实的注释方法,用于标注分解蕴含数据集,并评估其对基于大语言模型的文本推理的影响。我们发现,我们生成的数据集RDTE(识别分解文本蕴含)的内部一致性比先前的分解蕴含数据集显著高出9%,表明RDTE在长期存在的形成明确蕴含判别协议问题上迈出了重要一步。此外,通过知识蒸馏训练面向RDTE的蕴含分类器,并将其应用于现代神经符号推理引擎,相较于其他蕴含分类器基线,在准确性和证明质量方面均显著提升,这验证了该进展对文本推理的实际价值。