The current state of event detection research has two notable re-occurring limitations that we investigate in this study. First, the unidirectional nature of decoder-only LLMs presents a fundamental architectural bottleneck for natural language understanding tasks that depend on rich, bidirectional context. Second, we confront the conventional reliance on Micro-F1 scores in event detection literature, which systematically inflates performance by favoring majority classes. Instead, we focus on Macro-F1 as a more representative measure of a model's ability across the long-tail of event types. Our experiments demonstrate that models enhanced with sentence context achieve superior performance over canonical decoder-only baselines. Using Low-Rank Adaptation (LoRA) during finetuning provides a substantial boost in Macro-F1 scores in particular, especially for the decoder-only models, showing that LoRA can be an effective tool to enhance LLMs' performance on long-tailed event classes.
翻译:当前事件检测研究存在两个值得注意的反复出现的局限性,我们在本研究中对此进行了探讨。首先,仅解码器大型语言模型的单向性构成了一个根本性的架构瓶颈,限制了依赖丰富双向上下文信息的自然语言理解任务。其次,我们挑战了事件检测文献中惯常依赖Micro-F1分数的做法,该指标因偏向多数类别而系统性夸大性能表现。相反,我们聚焦于Macro-F1,将其作为更能代表模型在长尾事件类型上整体能力的评估指标。实验表明,通过增强句子上下文信息的模型相较于经典的仅解码器基线实现了更优的性能。在微调过程中使用低秩自适应(LoRA)技术显著提升了Macro-F1分数,尤其对于仅解码器模型效果更为明显,这表明LoRA可成为增强大型语言模型在长尾事件类别上表现的有效工具。