While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that answers are supported by interpretable chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call REFLEX, is to add a rational, self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model beliefs (including beliefs about answer candidates) and their inferential relationships. Second, we identify and minimize contradictions in that graph using a formal constraint reasoner. We find that REFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture in which an LLM extended with a rational layer can provide an interpretable window into system beliefs, add a systematic reasoning capability, and repair latent inconsistencies present in the LLM.
翻译:虽然大语言模型(LLM)在问答任务中表现出色,但其答案在多大程度上(甚至是否)源自其潜在"信念"始终不明确。这种可解释性缺失正日益阻碍LLM的广泛运用。为解决此问题,我们的目标包括:显式化模型的信念及其推理关系,解决可能存在的矛盾,从而确保答案能通过可解释的推理链得到支持,且该推理链源于一致的信念网络。我们提出的方法名为REFLEX,即在LLM基础上增加理性自省层。首先,针对给定问题,我们采用逆向链式推理过程构建信念图,以具象化相关模型信念(包括候选答案的信念)及其推理关系。其次,通过形式化约束求解器识别并最小化该图中的矛盾。实验表明,REFLEX在保持答案准确率不变的前提下,将一致性显著提升8%-11%的绝对幅度,使答案总能得到源于更一致信念系统的可信推理链支持。这预示着一种新型系统架构:通过为LLM扩展理性层,既能提供系统信念的可解释窗口,又能增强系统性推理能力,同时修复LLM中潜在的不一致性。