Large language models (LLMs) have shown impressive abilities in answering questions across various domains, but they often encounter hallucination issues on questions that require professional and up-to-date knowledge. To address this limitation, retrieval-augmented generation (RAG) techniques have been proposed, which retrieve relevant information from external sources to inform their responses. However, existing RAG methods typically focus on a single type of external data, such as vectorized text database or knowledge graphs, and cannot well handle real-world questions on semi-structured data containing both text and relational information. To bridge this gap, we introduce PASemiQA, a novel approach that jointly leverages text and relational information in semi-structured data to answer questions. PASemiQA first generates a plan to identify relevant text and relational information to answer the question in semi-structured data, and then uses an LLM agent to traverse the semi-structured data and extract necessary information. Our empirical results demonstrate the effectiveness of PASemiQA across different semi-structured datasets from various domains, showcasing its potential to improve the accuracy and reliability of question answering systems on semi-structured data.
翻译:大型语言模型(LLM)在跨领域问答任务中展现出卓越能力,但在需要专业与时效性知识的问题上常出现幻觉现象。为应对这一局限,检索增强生成(RAG)技术应运而生,其通过从外部源检索相关信息以支撑生成回答。然而,现有RAG方法通常聚焦于单一类型的外部数据(如向量化文本数据库或知识图谱),难以有效处理现实世界中涉及同时包含文本与关系信息的半结构化数据的问题。为填补这一空白,本文提出PASemiQA——一种创新方法,通过协同利用半结构化数据中的文本与关系信息进行问答。PASemiQA首先生成规划方案,识别半结构化数据中与问题相关的文本及关系信息,随后借助LLM智能体遍历半结构化数据并提取必要信息。实证结果表明,PASemiQA在跨领域多类半结构化数据集上均表现优异,彰显了其在提升半结构化数据问答系统准确性与可靠性方面的潜力。