Entity linking (EL) is the task of linking a textual mention to its corresponding entry in a knowledge base, and is critical for many knowledge-intensive NLP applications. When applied to tables in scientific papers, EL is a step toward large-scale scientific knowledge bases that could enable advanced scientific question answering and analytics. We present the first dataset for EL in scientific tables. EL for scientific tables is especially challenging because scientific knowledge bases can be very incomplete, and disambiguating table mentions typically requires understanding the papers's tet in addition to the table. Our dataset, S2abEL, focuses on EL in machine learning results tables and includes hand-labeled cell types, attributed sources, and entity links from the PaperswithCode taxonomy for 8,429 cells from 732 tables. We introduce a neural baseline method designed for EL on scientific tables containing many out-of-knowledge-base mentions, and show that it significantly outperforms a state-of-the-art generic table EL method. The best baselines fall below human performance, and our analysis highlights avenues for improvement.
翻译:实体链接(Entity Linking, EL)是将文本提及与知识库中对应条目进行关联的任务,对诸多知识密集型自然语言处理应用至关重要。当应用于科学论文中的表格时,EL 是实现大规模科学知识库的关键步骤,有助于推动高级科学问答与分析。我们提出了首个面向科学表格的实体链接数据集。科学表格的实体链接挑战尤为突出:一方面科学知识库常存在严重的不完整性,另一方面对表格中提及项进行消歧通常需要同时理解论文正文与表格内容。我们的数据集 S2abEL 聚焦机器学习结果表格中的实体链接,涵盖来自 732 张表格中 8,429 个单元格的人工标注细胞类型、归因来源及基于 PaperswithCode 分类体系的实体链接。我们提出一种针对包含大量知识库外提及项的科学表格设计的神经基线方法,实验表明该方法显著优于当前最先进的通用表格实体链接方法。最佳基线方法仍低于人类表现,我们的分析揭示了改进方向。