Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering

There has been a growing effort to replace hand extraction of data from research papers with automated data extraction based on natural language processing, language models, and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work we propose the ChatExtract method that can fully automate very accurate data extraction with minimal initial effort and background, using an advanced conversational LLM. ChatExtract consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract that data, and assure the data's correctness through a series of follow-up questions. These follow-up questions largely overcome known issues with LLMs providing factually inaccurate responses. ChatExtract can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data we find precision and recall both close to 90% from the best conversational LLMs, like ChatGPT-4. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to ChatExtract, due to their simplicity, transferability, and accuracy are likely to become powerful tools for data extraction in the near future. Finally, databases for critical cooling rates of metallic glasses and yield strengths of high entropy alloys are developed using ChatExtract.

翻译：近年来，基于自然语言处理、语言模型以及近期的大语言模型（LLMs）的自动化数据提取方法逐渐取代人工从研究论文中提取数据的工作。尽管这些方法能够高效地从大量研究论文中提取数据，但需要大量前期投入、专业知识和编码工作。在本研究中，我们提出ChatExtract方法，该方法利用先进的对话式LLM，以最少的前期工作与知识背景实现完全自动化的高精度数据提取。ChatExtract包含一组针对对话式LLM设计的工程化提示，能够识别包含数据的句子、提取数据，并通过一系列后续问题确保数据的准确性。这些后续问题在很大程度上克服了LLM可能生成事实性错误响应的已知问题。ChatExtract可应用于任何对话式LLM，并实现高质量的数据提取。在材料数据测试中，我们从最佳对话式LLM（如ChatGPT-4）中获得了接近90%的精确率与召回率。我们证明，这种卓越性能得益于对话模型的信息保留能力，结合有目的的冗余设计以及通过后续提示引入不确定性。这些结果表明，类似ChatExtract的方法因其简便性、可迁移性和准确性，可能在未来成为数据提取的强大工具。最后，本研究利用ChatExtract开发了金属玻璃临界冷却速率与高熵合金屈服强度的数据库。