We address the fundamental task of inferring cross-document coreference and hierarchy in scientific texts, which has important applications in knowledge graph construction, search, recommendation and discovery. LLMs can struggle when faced with many long-tail technical concepts with nuanced variations. We present a novel method which generates context-dependent definitions of concept mentions by retrieving full-text literature, and uses the definitions to enhance detection of cross-document relations. We further generate relational definitions, which describe how two concept mentions are related or different, and design an efficient re-ranking approach to address the combinatorial explosion involved in inferring links across papers. In both fine-tuning and in-context learning settings we achieve large gains in performance. We provide analysis of generated definitions, shedding light on the relational reasoning ability of LLMs over fine-grained scientific concepts.
翻译:本文针对科学文本中跨文档共指与层级推断这一基础任务展开研究,该任务在知识图谱构建、搜索、推荐与发现中具有重要应用。当面对众多具有细微差异的长尾技术概念时,大语言模型(LLMs)可能面临困难。我们提出了一种新颖的方法,通过检索全文文献生成概念提及的上下文相关定义,并利用这些定义来增强跨文档关系的检测。我们进一步生成关系定义,用以描述两个概念提及如何相关或不同,并设计了一种高效的重新排序方法,以应对跨论文推断链接时涉及的组合爆炸问题。在微调与上下文学习两种设置下,我们的方法均实现了显著的性能提升。我们对生成的定义进行了分析,从而揭示了大语言模型在细粒度科学概念上进行关系推理的能力。