Thermoelectric materials provide a sustainable way to convert waste heat into electricity. However, data-driven discovery and optimization of these materials are challenging because of a lack of a reliable database. Here we developed a comprehensive database of 7,123 thermoelectric compounds, containing key information such as chemical composition, structural detail, seebeck coefficient, electrical and thermal conductivity, power factor, and figure of merit (ZT). We used the GPTArticleExtractor workflow, powered by large language models (LLM), to extract and curate data automatically from the scientific literature published in Elsevier journals. This process enabled the creation of a structured database that addresses the challenges of manual data collection. The open access database could stimulate data-driven research and advance thermoelectric material analysis and discovery.
翻译:热电材料提供了一种将废热转化为电能的可持续途径。然而,由于缺乏可靠的数据库,这些材料的数据驱动发现与优化面临挑战。本文构建了一个包含7,123种热电化合物的综合数据库,涵盖化学成分、结构细节、塞贝克系数、电导率与热导率、功率因子以及品质因数(ZT)等关键信息。我们采用基于大型语言模型(LLM)的GPTArticleExtractor工作流程,自动从Elsevier期刊发表的科学文献中提取并整理数据。这一过程实现了结构化数据库的构建,有效解决了人工数据收集的难题。该开放获取数据库有望推动数据驱动研究,促进热电材料的分析与发现。