Few-shot event detection (ED) has been widely studied, while this brings noticeable discrepancies, e.g., various motivations, tasks, and experimental settings, that hinder the understanding of models for future progress.This paper presents a thorough empirical study, a unified view of ED models, and a better unified baseline. For fair evaluation, we compare 12 representative methods on three datasets, which are roughly grouped into prompt-based and prototype-based models for detailed analysis. Experiments consistently demonstrate that prompt-based methods, including ChatGPT, still significantly trail prototype-based methods in terms of overall performance. To investigate their superior performance, we break down their design elements along several dimensions and build a unified framework on prototype-based methods. Under such unified view, each prototype-method can be viewed a combination of different modules from these design elements. We further combine all advantageous modules and propose a simple yet effective baseline, which outperforms existing methods by a large margin (e.g., 2.7% F1 gains under low-resource setting).
翻译:小样本事件检测(Few-shot Event Detection, ED)已被广泛研究,但由此带来了显著差异,例如不同的动机、任务和实验设置,这阻碍了人们对模型的理解以推动未来进展。本文对ED模型进行了深入的实证研究和统一视角的梳理,并提出一个更优的统一基线。为公平评估,我们在三个数据集上比较了12种代表性方法,这些方法大致分为基于提示(prompt-based)和基于原型(prototype-based)两类以进行详细分析。实验一致表明,包括ChatGPT在内的基于提示的方法,在整体性能上仍显著落后于基于原型的方法。为探究其优越性能,我们从多个维度分解其设计要素,并构建了一个基于原型方法的统一框架。在此统一视角下,每种原型方法可被视为这些设计要素中不同模块的组合。我们进一步整合所有优势模块,提出一个简单而有效的基线,该基线在多种设置下(如低资源场景下F1值提升2.7%)大幅超越现有方法。