It is crucial to understand a specific domain by events. Extensive event extraction research has been conducted in many domains such as news, finance, and biology. However, event extraction in scientific domain is still insufficiently supported by comprehensive datasets and tailored methods. Compared with other domains, scientific domain has two characteristics: (1) denser nuggets and events, and (2) more complex information forms. To solve the above problem, considering these two characteristics, we first construct SciEvents, a large-scale multi-event document-level dataset with a schema tailored for scientific domain. It consists of 2,508 documents and 24,381 events under multi-stage manual annotation and quality control. Then, we propose EXCEEDS, an end-to-end scientific event extraction framework by encoding dense nuggets into a grid matrix and simplifying complex event extraction as a nugget-based grid modeling task. Experiments on SciEvents demonstrate state-of-the-art performances of EXCEEDS. Both the SciEvents dataset and the EXCEEDS framework are released publicly to facilitate future research.
翻译:通过事件理解特定领域至关重要。尽管在新闻、金融、生物学等领域已开展了大量事件抽取研究,但科学领域的事件抽取仍缺乏全面数据集和定制化方法的充分支持。与其他领域相比,科学领域具有两大特征:(1)要素与事件密度更高;(2)信息形式更为复杂。针对上述问题,本文结合这两大特征,首先构建了SciEvents——一个为科学领域定制模式的大规模多事件文档级数据集。该数据集包含2,508篇文档和24,381个事件,并经过多阶段人工标注与质量控制。随后,我们提出EXCEEDS——一种端到端的科学事件抽取框架,通过将密集要素编码为网格矩阵,将复杂事件抽取简化为基于要素的网格建模任务。在SciEvents上的实验表明,EXCEEDS取得了最先进的性能。为促进后续研究,我们公开发布了SciEvents数据集与EXCEEDS框架。