LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems. Project page with dataset and code: https://sites.google.com/view/vedantany-10m
翻译:大语言模型(LLM)已彻底改变了信息检索与知识传播的格局。然而,其在专业领域的应用常受限于事实性错误与幻觉问题,尤其是在长尾知识分布场景中。本研究探索了检索增强生成(RAG)模型在专业知识领域中进行长文本问答(LFQA)的潜力。我们构建了VedantaNY-10M数据集,该数据集整理自关于古印度哲学不二论吠檀多的海量公开论述。我们开发了一个RAG模型,并以标准非RAG大语言模型为基准,重点评估了转录、检索与生成性能。计算语言学家与领域专家的人工评估表明,RAG模型在生成事实准确、内容全面且幻觉更少的回答方面显著优于标准模型。此外,一种强调独特低频术语的基于关键词的混合检索器进一步提升了效果。本研究为现代大语言模型与古代知识体系的有效融合提供了实践洞见。项目页面(含数据集与代码):https://sites.google.com/view/vedantany-10m