With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pertinent passages during in-context learning. The framework utilizes the emergent planning capabilities of LLMs, employing the divide-and-conquer strategy to break down complex queries into manageable sub-queries. It refines self-consistency majority voting for answer selection, which incorporates the recently proposed citation recall and precision metrics to assess the quality of thoughts, linking an answer's credibility intrinsically to the thought's quality. This methodology introduces a weighted system in majority voting, prioritizing answers based on the citation quality of their thoughts. Additionally, we propose a scoring mechanism for evaluating retrieved passages, considering factors such as citation frequency and quality, self-consistency confidence, and the retrieval module's ranking. Experiments reveal that HGOT outperforms other retrieval-augmented in-context learning methods, including Demonstrate-Search-Predict (DSP), ReAct, Self-Ask, and Retrieve-then-Read on different datasets by as much as $7\%$, demonstrating its efficacy in enhancing the factuality of LLMs.
翻译:随着大语言模型(LLMs)在众多应用中的广泛普及,其事实性问题与产生幻觉的倾向引发了显著担忧。为应对这一挑战,特别是针对检索增强的上下文学习场景,我们提出层级化思维图(HGOT)——一种结构化的多层图方法,旨在增强上下文学习过程中相关段落检索能力。该框架利用LLMs涌现出的规划能力,采用分治策略将复杂查询拆解为可管理的子查询。它优化了自洽性多数投票的答案选择机制,引入近期提出的引文召回率与精确度指标评估思维质量,将答案可信度与思维质量内在关联。该方法在多数投票中引入加权系统,基于引文质量对思维对应的候选答案进行优先级排序。此外,我们提出检索段落评分机制,综合考虑引文频率与质量、自洽性置信度以及检索模块排序等因素。实验表明,HGOT在不同数据集上相较其他检索增强上下文学习方法(包括Demonstrate-Search-Predict (DSP)、ReAct、Self-Ask和Retrieve-then-Read)性能提升高达7%,充分验证了其在增强LLMs事实性方面的有效性。