Natural Language Inference (NLI) is the task of determining whether a premise entails, contradicts, or is neutral with respect to a given hypothesis. The task is often framed as emulating human inferential processes, in which commonsense knowledge plays a major role. This study examines whether Large Language Models (LLMs) can generate useful commonsense axioms for Natural Language Inference, and evaluates their impact on performance using the SNLI and ANLI benchmarks with the Llama-3.1-70B and gpt-oss-120b models. We show that a hybrid approach, which selectively provides highly factual axioms based on judged helpfulness, yields consistent accuracy improvements of 1.99% to 6.88% across tested configurations, demonstrating the effectiveness of selective knowledge access for NLI. We also find that this targeted use of commonsense knowledge helps models overcome a bias toward the Neutral class by providing essential real-world context.
翻译:自然语言推理(NLI)的任务是判断一个前提是否蕴含、反驳或相对于一个给定假设保持中立。该任务通常被构建为模拟人类的推理过程,其中常识知识起着主要作用。本研究探讨大型语言模型(LLM)能否为自然语言推理生成有用的常识公理,并使用SNLI和ANLI基准测试集,结合Llama-3.1-70B和gpt-oss-120b模型评估其对性能的影响。我们证明,一种混合方法——基于判断出的有用性有选择地提供高事实性公理——在所有测试配置中实现了1.99%至6.88%的持续准确率提升,这证明了选择性知识访问对于NLI的有效性。我们还发现,这种有针对性的常识知识使用通过提供必要的现实世界背景,帮助模型克服了对中立类别的偏见。