In the field of Material Science, effective information retrieval systems are essential for facilitating research. Traditional Retrieval-Augmented Generation (RAG) approaches in Large Language Models (LLMs) often encounter challenges such as outdated information, hallucinations, limited interpretability due to context constraints, and inaccurate retrieval. To address these issues, Graph RAG integrates graph databases to enhance the retrieval process. Our proposed method processes Material Science documents by extracting key entities (referred to as MatIDs) from sentences, which are then utilized to query external Wikipedia knowledge bases (KBs) for additional relevant information. We implement an agent-based parsing technique to achieve a more detailed representation of the documents. Our improved version of Graph RAG called G-RAG further leverages a graph database to capture relationships between these entities, improving both retrieval accuracy and contextual understanding. This enhanced approach demonstrates significant improvements in performance for domains that require precise information retrieval, such as Material Science.
翻译:在材料科学领域,高效的信息检索系统对于促进研究至关重要。大型语言模型中的传统检索增强生成方法常面临信息过时、幻觉生成、因上下文限制导致的解释性有限以及检索不准确等挑战。为解决这些问题,图检索增强生成通过集成图数据库来优化检索过程。我们提出的方法通过从句子中提取关键实体(称为MatIDs)来处理材料科学文献,并利用这些实体查询外部维基百科知识库以获取更多相关信息。我们采用基于智能体的解析技术来实现对文献更细致的表征。我们改进的图检索增强生成版本——G-RAG,进一步利用图数据库捕捉这些实体之间的关系,从而提升了检索准确性和上下文理解能力。这种增强方法在材料科学等需要精确信息检索的领域展现出显著的性能改进。