Open Domain Multi-Hop Question Answering (ODMHQA) plays a crucial role in Natural Language Processing (NLP) by aiming to answer complex questions through multi-step reasoning over retrieved information from external knowledge sources. Recently, Large Language Models (LLMs) have demonstrated remarkable performance in solving ODMHQA owing to their capabilities including planning, reasoning, and utilizing tools. However, LLMs may generate off-topic answers when attempting to solve ODMHQA, namely the generated answers are irrelevant to the original questions. This issue of off-topic answers accounts for approximately one-third of incorrect answers, yet remains underexplored despite its significance. To alleviate this issue, we propose the Discriminate->Re-Compose->Re- Solve->Re-Decompose (Dr3) mechanism. Specifically, the Discriminator leverages the intrinsic capabilities of LLMs to judge whether the generated answers are off-topic. In cases where an off-topic answer is detected, the Corrector performs step-wise revisions along the reversed reasoning chain (Re-Compose->Re-Solve->Re-Decompose) until the final answer becomes on-topic. Experimental results on the HotpotQA and 2WikiMultiHopQA datasets demonstrate that our Dr3 mechanism considerably reduces the occurrence of off-topic answers in ODMHQA by nearly 13%, improving the performance in Exact Match (EM) by nearly 3% compared to the baseline method without the Dr3 mechanism.
翻译:摘要:开放域多跳问答(ODMHQA)在自然语言处理(NLP)中扮演着关键角色,其目标是通过对外部知识源检索到的信息进行多步推理来回答复杂问题。近年来,大语言模型(LLMs)因其规划、推理和工具利用能力,在解决ODMHQA方面表现出卓越性能。然而,LLMs在尝试解答ODMHQA时可能生成离题答案,即生成的回答与原始问题无关。这种离题答案问题约占错误答案的三分之一,尽管其重要性显著,却尚未得到充分探究。为缓解该问题,我们提出了判别→重组→重解→重分解(Dr3)机制。具体而言,判别器利用LLMs的固有能力判断生成的答案是否离题。当检测到离题答案时,修正器沿逆向推理链(重组→重解→重分解)逐步修正,直至最终答案回归主题。在HotpotQA和2WikiMultiHopQA数据集上的实验表明,与未使用Dr3机制的基线方法相比,我们的Dr3机制将ODMHQA中的离题答案发生率降低了近13%,并将精确匹配(EM)性能提升了近3%。