Retrieval-Augmented Generation (RAG) systems have demonstrated remarkable potential as question answering systems in the K-12 Education domain, where knowledge is typically queried within the restricted scope of authoritative textbooks. However, the discrepancy between textbooks and the parametric knowledge in Large Language Models (LLMs) could undermine the effectiveness of RAG systems. To systematically investigate the robustness of RAG systems under such knowledge discrepancies, we present EduKDQA, a question answering dataset that simulates knowledge discrepancies in real applications by applying hypothetical knowledge updates in answers and source documents. EduKDQA includes 3,005 questions covering five subjects, under a comprehensive question typology from the perspective of context utilization and knowledge integration. We conducted extensive experiments on retrieval and question answering performance. We find that most RAG systems suffer from a substantial performance drop in question answering with knowledge discrepancies, while questions that require integration of contextual knowledge and parametric knowledge pose a challenge to LLMs.
翻译:检索增强生成(RAG)系统在K-12教育领域展现出作为问答系统的显著潜力,该领域的知识查询通常局限于权威教科书的限定范围。然而,教科书与大语言模型(LLMs)参数化知识之间的差异可能削弱RAG系统的有效性。为系统研究此类知识差异下RAG系统的鲁棒性,我们提出了EduKDQA问答数据集,该数据集通过在答案与源文档中应用假设性知识更新,模拟实际应用中的知识差异。EduKDQA包含3,005个问题,涵盖五个学科,并依据上下文利用与知识整合的视角构建了全面的问题类型学。我们开展了检索与问答性能的广泛实验。研究发现,大多数RAG系统在存在知识差异的问答任务中均出现显著性能下降,而需要整合上下文知识与参数化知识的问题对LLMs构成挑战。