Recent works in Event Argument Extraction (EAE) have focused on improving model generalizability to cater to new events and domains. However, standard benchmarking datasets like ACE and ERE cover less than 40 event types and 25 entity-centric argument roles. Limited diversity and coverage hinder these datasets from adequately evaluating the generalizability of EAE models. In this paper, we first contribute by creating a large and diverse EAE ontology. This ontology is created by transforming FrameNet, a comprehensive semantic role labeling (SRL) dataset for EAE, by exploiting the similarity between these two tasks. Then, exhaustive human expert annotations are collected to build the ontology, concluding with 115 events and 220 argument roles, with a significant portion of roles not being entities. We utilize this ontology to further introduce GENEVA, a diverse generalizability benchmarking dataset comprising four test suites, aimed at evaluating models' ability to handle limited data and unseen event type generalization. We benchmark six EAE models from various families. The results show that owing to non-entity argument roles, even the best-performing model can only achieve 39% F1 score, indicating how GENEVA provides new challenges for generalization in EAE. Overall, our large and diverse EAE ontology can aid in creating more comprehensive future resources, while GENEVA is a challenging benchmarking dataset encouraging further research for improving generalizability in EAE. The code and data can be found at https://github.com/PlusLabNLP/GENEVA.
翻译:近期事件论元抽取(EAE)研究聚焦于提升模型泛化能力,以适配新事件与领域。然而,ACE与ERE等标准基准数据集仅涵盖不到40种事件类型及25种以实体为中心的论元角色。有限的多样性与覆盖范围阻碍了这些数据集对EAE模型泛化能力的充分评估。本文首先通过构建大规模、多样化的EAE本体作出贡献。该本体通过转化综合语义角色标注(SRL)数据集FrameNet,并利用两任务间的相似性而创建。随后,我们收集详尽的人类专家标注构建本体,最终包含115种事件与220种论元角色,其中大量角色并非实体。基于此本体,我们进一步提出GENEVA——包含四个测试套件的多样化泛化基准数据集,旨在评估模型处理有限数据与未见事件类型泛化的能力。我们对六类EAE模型进行基准测试。结果表明,由于非实体论元角色的存在,即使表现最优的模型也仅能达到39%的F1分数,这揭示了GENEVA如何为EAE泛化带来新挑战。总体而言,大规模多样化的EAE本体有助于创建更全面的未来资源,而GENEVA作为具有挑战性的基准数据集,可推动提升EAE泛化能力的进一步研究。代码与数据见https://github.com/PlusLabNLP/GENEVA。