With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations has emerged as a significant concern. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments indicate that HGOT excels as a versatile approach, outperforming competing models in FEVER by up to $7\%$ and matching leading models such as Retrieve-then-Read in Open-SQuAD, and DSP in HotPotQA, demonstrating its efficacy in enhancing LLMs' factuality.
翻译:随着大型语言模型(LLMs)在众多应用中的广泛采用,事实性挑战和产生幻觉的倾向已成为一个重要问题。为解决这一问题,特别是在检索增强的上下文学习中,我们引入了层次化思维图(HGOT),这是一种结构化的多层图方法,旨在增强上下文学习过程中相关段落的检索能力。该框架利用LLMs涌现的规划能力,采用分治策略将复杂查询分解为可管理的子查询。它改进了用于答案选择的自我一致性多数投票机制,其中引入了最近提出的引用召回率和精确度指标来评估思维质量,从而将答案的可信度与思维质量内在关联。该方法在多数投票中引入了加权系统,根据其思维的引用质量对答案进行优先级排序。此外,我们提出了一种评估检索段落的评分机制,该机制综合考虑了引用频率与质量、自我一致性置信度以及检索模块的排序等因素。实验表明,HGOT作为一种多功能方法表现出色,在FEVER上优于竞争模型高达$7\%$,在Open-SQuAD上与Retrieve-then-Read等领先模型相当,在HotPotQA上与DSP表现相当,证明了其在增强LLMs事实性方面的有效性。