Current Large Language Models (LLMs) have shown strong reasoning capabilities in commonsense question answering benchmarks, but the process underlying their success remains largely opaque. As a consequence, recent approaches have equipped LLMs with mechanisms for knowledge retrieval, reasoning and introspection, not only to improve their capabilities but also to enhance the interpretability of their outputs. However, these methods require additional training, hand-crafted templates or human-written explanations. To address these issues, we introduce ZEBRA, a zero-shot question answering framework that combines retrieval, case-based reasoning and introspection and dispenses with the need for additional training of the LLM. Given an input question, ZEBRA retrieves relevant question-knowledge pairs from a knowledge base and generates new knowledge by reasoning over the relationships in these pairs. This generated knowledge is then used to answer the input question, improving the model's performance and interpretability. We evaluate our approach across 8 well-established commonsense reasoning benchmarks, demonstrating that ZEBRA consistently outperforms strong LLMs and previous knowledge integration approaches, achieving an average accuracy improvement of up to 4.5 points.
翻译:当前的大型语言模型(LLMs)在常识问答基准测试中展现出强大的推理能力,但其成功背后的过程在很大程度上仍不透明。因此,近期研究为LLMs配备了知识检索、推理与自省的机制,不仅旨在提升其能力,也为了增强其输出的可解释性。然而,这些方法需要额外的训练、手工设计的模板或人工撰写的解释。为解决这些问题,我们提出了ZEBRA,一种零样本问答框架,它结合了检索、基于案例的推理与自省,且无需对LLM进行额外训练。给定一个输入问题,ZEBRA从知识库中检索相关的“问题-知识”对,并通过推理这些对中的关系来生成新的知识。随后,利用生成的知识来回答输入问题,从而提升模型的性能与可解释性。我们在8个成熟的常识推理基准上评估了该方法,结果表明ZEBRA持续优于强大的LLMs及先前的知识集成方法,平均准确率提升最高达4.5个百分点。