Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering -- Example of ChatGPT

There has been a growing effort to replace hand extraction of data from research papers with automated data extraction based on natural language processing (NLP), language models (LMs), and recently, large language models (LLMs). Although these methods enable efficient extraction of data from large sets of research papers, they require a significant amount of up-front effort, expertise, and coding. In this work we propose the ChatExtract method that can fully automate very accurate data extraction with essentially no initial effort or background using an advanced conversational LLM (or AI). ChatExtract consists of a set of engineered prompts applied to a conversational LLM that both identify sentences with data, extract data, and assure its correctness through a series of follow-up questions. These follow-up questions address a critical challenge associated with LLMs - their tendency to provide factually inaccurate responses. ChatExtract can be applied with any conversational LLMs and yields very high quality data extraction. In tests on materials data we find precision and recall both over 90% from the best conversational LLMs, likely rivaling or exceeding human accuracy in many cases. We demonstrate that the exceptional performance is enabled by the information retention in a conversational model combined with purposeful redundancy and introducing uncertainty through follow-up prompts. These results suggest that approaches similar to ChatExtract, due to their simplicity, transferability and accuracy are likely to replace other methods of data extraction in the near future.

翻译：近年来，人们日益致力于用基于自然语言处理（NLP）、语言模型（LM）以及最近的大型语言模型（LLM）的自动数据提取方法，取代人工从研究论文中提取数据的工作。尽管这些方法能够高效地从大量研究论文中提取数据，但它们需要大量的前期投入、专业知识和编程工作。在本研究中，我们提出了ChatExtract方法，该方法能够完全自动化地实现高度精确的数据提取，且几乎不需要任何初始工作或背景知识，仅需使用先进的对话式LLM（或AI）。ChatExtract包含一组针对对话式LLM设计的工程化提示，这些提示能够识别包含数据的句子、提取数据，并通过一系列追问确保数据的正确性。这些追问解决了LLM面临的一个关键挑战——它们倾向于提供事实上不准确的回答。ChatExtract可适用于任何对话式LLM，并产生非常高质量的数据提取结果。在材料数据测试中，我们发现，使用最优的对话式LLM，精确率和召回率均超过90%，在许多情况下可能媲美甚至超越人类准确性。我们证明，这种卓越性能得益于对话模型的信息保留能力，结合有目的的冗余设计，以及通过追问引入不确定性。这些结果表明，类似于ChatExtract的方法，因其简单性、可迁移性和准确性，很可能在不久的将来取代其他数据提取方法。