Scientific discovery is severely bottlenecked by the inability of manual curation to keep pace with exponential publication rates. This creates a widening knowledge gap. This is especially stark in photovoltaics, where the leading database for perovskite solar cells has been stagnant since 2021 despite massive ongoing research output. Here, we resolve this challenge by establishing an autonomous, self-updating living database (PERLA). Our pipeline integrates large language models with physics-aware validation to extract complex device data from the continuous literature stream, achieving human-level precision (>90%) and eliminating annotator variance. By employing this system on the previously inaccessible post-2021 literature, we uncover critical evolutionary trends hidden by data lag: the field has decisively shifted toward inverted architectures employing self-assembled monolayers and formamidinium-rich compositions, driving a clear trajectory of sustained voltage loss reduction. PERLA transforms static publications into dynamic knowledge resources that enable data-driven discovery to operate at the speed of publication.
翻译:科学发现正因人工整理无法跟上指数级增长的论文发表速度而遭遇严重瓶颈,这导致了日益扩大的知识鸿沟。在光伏领域,这一矛盾尤为突出:尽管相关研究持续大量产出,钙钛矿太阳能电池领域的主要数据库自2021年以来便陷入停滞。本研究通过构建自主、持续更新的活体数据库(PERLA)解决了这一难题。我们的流程将大语言模型与物理感知验证相结合,从持续涌现的文献流中提取复杂的器件数据,实现了人类水平的精确度(>90%)并消除了标注者偏差。通过对此前无法获取的2021年后文献应用该系统,我们揭示了被数据滞后所掩盖的关键演进趋势:该领域已明确转向采用自组装单分子层和富甲脒成分的反型结构,推动电压损耗持续降低的清晰发展路径。PERLA将静态文献转化为动态知识资源,使数据驱动型发现能够以论文发表的速度同步推进。