Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them. Code & datasets are available at https://github.com/Media-Bias-Group/SciDef.
翻译:定义是所有科学工作的基础,但随着出版物数量的显著增长,收集与特定关键词相关的定义已变得极具挑战性。为此,我们提出了SciDef——一种基于大语言模型的自动化定义提取流程。我们在DefExtra和DefSim这两个新颖的数据集上对SciDef进行了测试,它们分别包含人工提取的定义及定义对相似性标注。通过评估16种语言模型在不同提示策略下的表现,我们证明了多步提示和DSPy优化的提示策略能有效提升提取性能。为评估提取效果,我们测试了多种指标,结果表明基于自然语言推理的方法能产生最可靠的结果。我们的研究显示,大语言模型在很大程度上能够从科学文献中提取定义(在我们的测试集中达到86.4%的提取率);然而未来的工作不应仅关注发现定义,更应聚焦于识别相关定义,因为模型存在过度生成的倾向。代码与数据集已发布于https://github.com/Media-Bias-Group/SciDef。