Chemical synthesis, which is crucial for advancing material synthesis and drug discovery, impacts various sectors including environmental science and healthcare. The rise of technology in chemistry has generated extensive chemical data, challenging researchers to discern patterns and refine synthesis processes. Artificial intelligence (AI) helps by analyzing data to optimize synthesis and increase yields. However, AI faces challenges in processing literature data due to the unstructured format and diverse writing style of chemical literature. To overcome these difficulties, we introduce an end-to-end AI agent framework capable of high-fidelity extraction from extensive chemical literature. This AI agent employs large language models (LLMs) for prompt generation and iterative optimization. It functions as a chemistry assistant, automating data collection and analysis, thereby saving manpower and enhancing performance. Our framework's efficacy is evaluated using accuracy, recall, and F1 score of reaction condition data, and we compared our method with human experts in terms of content correctness and time efficiency. The proposed approach marks a significant advancement in automating chemical literature extraction and demonstrates the potential for AI to revolutionize data management and utilization in chemistry.
翻译:化学合成对于推动材料合成和药物发现至关重要,影响着环境科学和医疗保健等多个领域。化学领域技术的兴起产生了大量的化学数据,这给研究人员识别模式和优化合成过程带来了挑战。人工智能(AI)通过分析数据以优化合成并提高产率提供了帮助。然而,由于化学文献的非结构化格式和多样的写作风格,AI在处理文献数据时面临挑战。为克服这些困难,我们引入了一种端到端的AI代理框架,能够从大量化学文献中进行高保真度提取。该AI代理采用大型语言模型(LLMs)进行提示生成和迭代优化。它作为一种化学助手,自动化数据收集和分析,从而节省人力并提升性能。我们通过反应条件数据的准确率、召回率和F1分数评估了框架的有效性,并在内容正确性和时间效率方面将我们的方法与人类专家进行了比较。所提出的方法标志着自动化化学文献提取的重大进展,并展示了AI在化学领域革命性地改变数据管理和利用的潜力。