While large language models (LLMs) are proficient at question-answering (QA), the dependencies between their answers and other "beliefs" they may have about the world are typically unstated, and may even be in conflict. Our goal is to uncover such dependencies and reduce inconsistencies among them, so that answers are supported by faithful, system-believed chains of reasoning drawn from a consistent network of beliefs. Our approach, which we call REFLEX, is to add a "rational", self-reflecting layer on top of the LLM. First, given a question, we construct a belief graph using a backward-chaining process to materialize relevant model "beliefs" (including beliefs about answer candidates) and the inferential relationships between them. Second, we identify and minimize contradictions in that graph using a formal constraint reasoner. We find that REFLEX significantly improves consistency (by 8%-11% absolute) without harming overall answer accuracy, resulting in answers supported by faithful chains of reasoning drawn from a more consistent belief system. This suggests a new style of system architecture, in which an LLM extended with a rational layer of self-reflection can repair latent inconsistencies within the LLM alone.
翻译:尽管大型语言模型(LLMs)在问答任务中表现优异,但其答案与模型可能持有的其他"信念"之间的依赖关系通常未被明确阐述,甚至可能存在冲突。我们的目标是揭示此类依赖关系并减少其中的不一致性,从而让答案能够依托于一个一致的信念网络,并基于模型自身认可的推理链条提供可靠支撑。我们提出的方法名为REFLEX,即在语言模型之上添加一层"理性"的自我反思层。具体而言,首先针对给定问题,通过反向链式推理过程构建信念图,以具象化模型相关的"信念"(包括对候选答案的信念)及其推理关系。其次,利用形式化约束推理器识别并最小化该图中的矛盾。实验表明,REFLEX在保持整体答案准确率的前提下,显著提升了一致性(绝对提升8%-11%),使得答案能够由基于更一致信念系统的可靠推理链条所支撑。这揭示了一种新型系统架构:通过为语言模型扩展具有自我反思能力的理性层,能够修复语言模型内部潜藏的不一致性。