Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce "phantom" content that undermines their reliability, which poses a serious challenge for their deployment in real-world scenarios. Enhancing LLMs by combining external databases and information retrieval mechanisms is an effective path. To address the above challenges, we propose a new approach called WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system. First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval. WeKnow-RAG then utilizes domain-specific knowledge graphs to satisfy a variety of queries and domains, thereby improving performance on factual information and complex reasoning tasks by employing multi-stage web page retrieval techniques using both sparse and dense retrieval methods. Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process. Finally, we also integrate a self-assessment mechanism for the LLM to evaluate the trustworthiness of the answers it generates. Our approach proves its outstanding effectiveness in a wide range of offline experiments and online submissions.
翻译:大型语言模型(LLMs)极大地推动了自适应智能代理的发展,并被视为实现通用人工智能(AGI)的重要途径。然而,LLMs 容易产生事实性错误信息,并经常生成损害其可靠性的“幻觉”内容,这对其在实际场景中的部署构成了严峻挑战。通过结合外部数据库与信息检索机制来增强 LLMs 是一条有效路径。为应对上述挑战,我们提出了一种名为 WeKnow-RAG 的新方法,该方法将网络搜索与知识图谱集成到一个“检索增强生成(RAG)”系统中。首先,通过结合知识图谱的结构化表示与稠密向量检索的灵活性,提高了 LLM 响应的准确性与可靠性。WeKnow-RAG 随后利用领域特定的知识图谱来满足多样化的查询与领域需求,进而通过采用稀疏与稠密检索相结合的多阶段网页检索技术,提升了在事实性信息与复杂推理任务上的性能。我们的方法有效平衡了信息检索的效率与准确性,从而优化了整体检索流程。最后,我们还集成了 LLM 的自评估机制,用于评估其生成答案的可信度。我们的方法在广泛的离线实验与在线提交中证明了其卓越的有效性。