In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application, an artificial intelligence (AI)-based system designed to assist data scientists in accessing precise and contextually relevant academic resources. The AI-powered application integrates advanced techniques, including the GeneRation Of BIbliographic Data (GROBID) technique for extracting bibliographic information, fine-tuned embedding models, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. This implementation of AI specifically addresses the challenge of academic literature navigation. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system's effectiveness in reducing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced Retrieval-Augmented Generation system to transform academic exploration within data science, ultimately advancing the workflow of research and innovation in the field.
翻译:在快速发展的数据科学领域,高效地驾驭海量学术文献对于做出明智决策和推动创新至关重要。本文提出了一种增强型检索增强生成(RAG)应用,这是一个基于人工智能(AI)的系统,旨在帮助数据科学家获取精确且上下文相关的学术资源。该AI驱动的应用集成了多项先进技术,包括用于提取文献信息的GROBID技术、经过微调的嵌入模型、语义分块以及摘要优先检索方法,从而显著提高了检索信息的相关性和准确性。该AI实现专门针对学术文献导航的挑战。使用检索增强生成评估系统(RAGAS)框架进行的全面评估表明,关键指标(尤其是上下文相关性)得到了显著改善,这突显了该系统在减轻信息过载和优化决策过程方面的有效性。我们的研究结果凸显了这种增强型检索增强生成系统在变革数据科学领域学术探索方面的潜力,最终将推动该领域的研究与创新工作流程。