Large language models (LLMs) have demonstrated remarkable capabilities in various scientific domains, from natural language processing to complex problem-solving tasks. Their ability to understand and generate human-like text has opened up new possibilities for advancing scientific research, enabling tasks such as data analysis, literature review, and even experimental design. One of the most promising applications of LLMs in this context is hypothesis generation, where they can identify novel research directions by analyzing existing knowledge. However, despite their potential, LLMs are prone to generating ``hallucinations'', outputs that are plausible-sounding but factually incorrect. Such a problem presents significant challenges in scientific fields that demand rigorous accuracy and verifiability, potentially leading to erroneous or misleading conclusions. To overcome these challenges, we propose KG-CoI (Knowledge Grounded Chain of Ideas), a novel system that enhances LLM hypothesis generation by integrating external, structured knowledge from knowledge graphs (KGs). KG-CoI guides LLMs through a structured reasoning process, organizing their output as a chain of ideas (CoI), and includes a KG-supported module for the detection of hallucinations. With experiments on our newly constructed hypothesis generation dataset, we demonstrate that KG-CoI not only improves the accuracy of LLM-generated hypotheses but also reduces the hallucination in their reasoning chains, highlighting its effectiveness in advancing real-world scientific research.
翻译:大型语言模型(LLMs)在从自然语言处理到复杂问题求解的众多科学领域中展现出卓越能力。其理解和生成类人文本的潜力为推进科学研究开辟了新途径,使得数据分析、文献综述乃至实验设计等任务成为可能。在此背景下,LLMs最具前景的应用之一是假设生成——通过分析现有知识识别新颖的研究方向。然而,尽管潜力巨大,LLMs容易产生“幻觉”,即生成看似合理但事实错误的输出。这一问题在要求严格准确性与可验证性的科学领域构成重大挑战,可能导致错误或误导性结论。为应对这些挑战,我们提出KG-CoI(基于知识的思维链),一种通过整合知识图谱(KGs)中的外部结构化知识来增强LLM假设生成的新系统。KG-CoI通过结构化推理流程引导LLMs,将其输出组织为思维链(CoI),并包含一个支持知识图谱的幻觉检测模块。基于我们新构建的假设生成数据集的实验表明,KG-CoI不仅能提升LLM生成假设的准确性,还能减少其推理链中的幻觉现象,彰显了其在推进现实世界科学研究中的有效性。