Answering complex real-world questions in the medical domain often requires accurate retrieval from medical Textual Knowledge Graphs (medical TKGs), as the relational path information from TKGs could enhance the inference ability of Large Language Models (LLMs). However, the main bottlenecks lie in the scarcity of existing medical TKGs, the limited expressiveness of their topological structures, and the lack of comprehensive evaluations of current retrievers for medical TKGs. To address these challenges, we first develop a Dataset1 for LLMs Complex Reasoning over medical Textual Knowledge Graphs (RiTeK), covering a broad range of topological structures. Specifically, we synthesize realistic user queries integrating diverse topological structures, relational information, and complex textual descriptions. We conduct a rigorous medical expert evaluation process to assess and validate the quality of our synthesized queries. RiTeK also serves as a comprehensive benchmark dataset for evaluating the capabilities of retrieval systems built upon LLMs. By assessing 11 representative retrievers on this benchmark, we observe that existing methods struggle to perform well, revealing notable limitations in current LLM-driven retrieval approaches. These findings highlight the pressing need for more effective retrieval systems tailored for semi-structured data in the medical domain.
翻译:回答医学领域的复杂现实问题通常需要从医学文本知识图谱(medical TKGs)中进行精确检索,因为TKG中的关系路径信息能够增强大型语言模型(LLMs)的推理能力。然而,主要瓶颈在于现有医学TKGs的稀缺性、其拓扑结构表达能力的局限性,以及当前针对医学TKGs的检索系统缺乏全面评估。为应对这些挑战,我们首先开发了一个面向LLMs在医学文本知识图谱上进行复杂推理的数据集(RiTeK),该数据集涵盖了广泛的拓扑结构。具体而言,我们通过融合多样化的拓扑结构、关系信息和复杂的文本描述,合成了贴近现实的用户查询。我们实施了严格的医学专家评估流程,以评估和验证所合成查询的质量。RiTeK还可作为评估基于LLMs构建的检索系统能力的综合性基准数据集。通过在该基准上评估11种代表性检索模型,我们发现现有方法表现欠佳,揭示了当前LLM驱动检索方法的显著局限性。这些发现凸显了为医学领域半结构化数据定制更有效检索系统的迫切需求。