Recent studies have increasingly applied natural language processing (NLP) to automatically extract experimental research data from the extensive battery materials literature. Despite the complex process involved in battery manufacturing -- from material synthesis to cell assembly -- there has been no comprehensive study systematically organizing this information. In response, we propose a language modeling-based protocol, Text-to-Battery Recipe (T2BR), for the automatic extraction of end-to-end battery recipes, validated using a case study on batteries containing LiFePO4 cathode material. We report machine learning-based paper filtering models, screening 2,174 relevant papers from the keyword-based search results, and unsupervised topic models to identify 2,876 paragraphs related to cathode synthesis and 2,958 paragraphs related to cell assembly. Then, focusing on the two topics, two deep learning-based named entity recognition models are developed to extract a total of 30 entities -- including precursors, active materials, and synthesis methods -- achieving F1 scores of 88.18% and 94.61%. The accurate extraction of entities enables the systematic generation of 165 end-toend recipes of LiFePO4 batteries. Our protocol and results offer valuable insights into specific trends, such as associations between precursor materials and synthesis methods, or combinations between different precursor materials. We anticipate that our findings will serve as a foundational knowledge base for facilitating battery-recipe information retrieval. The proposed protocol will significantly accelerate the review of battery material literature and catalyze innovations in battery design and development.
翻译:近年来,自然语言处理技术越来越多地被应用于从海量电池材料文献中自动提取实验研究数据。尽管电池制造过程——从材料合成到电池组装——涉及复杂流程,但目前尚无系统整理此类信息的综合性研究。为此,我们提出一种基于语言建模的协议——文本到电池配方,用于自动提取端到端电池配方,并以含LiFePO4正极材料的电池为案例进行了验证。我们报告了基于机器学习的论文筛选模型,从基于关键词的搜索结果中筛选出2,174篇相关论文,并采用无监督主题模型识别出2,876个与正极合成相关的段落及2,958个与电池组装相关的段落。随后,针对这两个主题,开发了两个基于深度学习的命名实体识别模型,共提取包括前驱体、活性材料和合成方法在内的30类实体,其F1分数分别达到88.18%和94.61%。实体的精准提取实现了165个LiFePO4电池端到端配方的系统化生成。我们的协议和结果为特定趋势(如前驱体材料与合成方法的关联性,或不同前驱体材料的组合规律)提供了有价值的洞见。我们预期该研究成果将作为促进电池配方信息检索的基础知识库。所提出的协议将显著加速电池材料文献的审阅进程,并推动电池设计与开发领域的创新。