Retrieval-augmented generation (RAG) has rapidly advanced the language model field, particularly in question-answering (QA) systems. By integrating external documents during the response generation phase, RAG significantly enhances the accuracy and reliability of language models. This method elevates the quality of responses and reduces the frequency of hallucinations, where the model generates incorrect or misleading information. However, these methods exhibit limited retrieval accuracy when faced with numerous indistinguishable documents, presenting notable challenges in their practical application. In response to these emerging challenges, we present HiQA, an advanced multi-document question-answering (MDQA) framework that integrates cascading metadata into content and a multi-route retrieval mechanism. We also release a benchmark called MasQA to evaluate and research in MDQA. Finally, HiQA demonstrates the state-of-the-art performance in multi-document environments.
翻译:检索增强生成(RAG)技术极大地推动了语言模型领域的发展,尤其在问答(QA)系统中表现突出。通过在响应生成阶段整合外部文档,RAG显著提升了语言模型的准确性与可靠性。该方法不仅提高了回答质量,还减少了模型产生错误或误导性信息(即“幻觉”)的频率。然而,当面对大量难以区分的文档时,现有方法的检索精度有限,这给其实际应用带来了显著挑战。为应对这些新出现的挑战,本文提出了HiQA——一个先进的多文档问答(MDQA)框架,该框架将级联元数据整合到内容中,并采用多路径检索机制。我们还发布了一个名为MasQA的基准数据集,用于评估和推进MDQA领域的研究。最终,HiQA在多文档环境中展现了最先进的性能。