Retrieval-augmented generation (RAG) has emerged as a powerful framework for enhancing large language models (LLMs) with external knowledge, particularly in scientific domains that demand specialized and dynamic information. Despite its promise, the application of RAG in the chemistry domain remains underexplored, primarily due to the lack of high-quality, domain-specific corpora and well-curated evaluation benchmarks. In this work, we introduce ChemRAG-Bench, a comprehensive benchmark designed to systematically assess the effectiveness of RAG across a diverse set of chemistry-related tasks. The accompanying chemistry corpus integrates heterogeneous knowledge sources, including scientific literature, the PubChem database, PubMed abstracts, textbooks, and Wikipedia entries. In addition, we present ChemRAG-Toolkit, a modular and extensible RAG toolkit that supports five retrieval algorithms and eight LLMs. Using ChemRAG-Toolkit, we demonstrate that RAG yields a substantial performance gain -- achieving an average relative improvement of 17.4% over direct inference methods. We further conduct in-depth analyses on retriever architectures, corpus selection, and the number of retrieved passages, culminating in practical recommendations to guide future research and deployment of RAG systems in the chemistry domain. The code and data is available at https://chemrag.github.io.
翻译:检索增强生成(RAG)已成为一种强大的框架,通过外部知识增强大型语言模型(LLMs)的能力,尤其是在需要专业且动态信息的科学领域。尽管前景广阔,RAG在化学领域的应用仍未得到充分探索,这主要归因于缺乏高质量、领域特定的语料库和精心策划的评估基准。本研究介绍了ChemRAG-Bench,这是一个旨在系统评估RAG在多样化化学相关任务中有效性的综合基准。配套的化学语料库整合了异构知识源,包括科学文献、PubChem数据库、PubMed摘要、教科书和维基百科条目。此外,我们提出了ChemRAG-Toolkit,一个模块化且可扩展的RAG工具包,支持五种检索算法和八种LLMs。使用ChemRAG-Toolkit,我们证明RAG带来了显著的性能提升——相较于直接推理方法,平均相对改进达到17.4%。我们进一步对检索器架构、语料库选择和检索段落数量进行了深入分析,最终提出了实用建议,以指导未来化学领域RAG系统的研究和部署。代码和数据可在https://chemrag.github.io获取。