We present CorPipe 24, the winning entry to the CRAC 2024 Shared Task on Multilingual Coreference Resolution. In this third iteration of the shared task, a novel objective is to also predict empty nodes needed for zero coreference mentions (while the empty nodes were given on input in previous years). This way, coreference resolution can be performed on raw text. We evaluate two model variants: a~two-stage approach (where the empty nodes are predicted first using a pretrained encoder model and then processed together with sentence words by another pretrained model) and a single-stage approach (where a single pretrained encoder model generates empty nodes, coreference mentions, and coreference links jointly). In both settings, CorPipe surpasses other participants by a large margin of 3.9 and 2.8 percent points, respectively. The source code and the trained model are available at https://github.com/ufal/crac2024-corpipe .
翻译:我们介绍了 CorPipe 24,这是 CRAC 2024 多语言共指消解共享任务的获胜方案。在该共享任务的第三次迭代中,一个新颖的目标是同时预测零指代共指提及所需的空节点(而在往年,空节点是作为输入给定的)。通过这种方式,共指消解可以直接在原始文本上进行。我们评估了两种模型变体:一种两阶段方法(首先使用一个预训练的编码器模型预测空节点,然后与句子词语一起由另一个预训练模型处理)和一种单阶段方法(其中单个预训练的编码器模型联合生成空节点、共指提及和共指链接)。在这两种设置下,CorPipe 分别以 3.9 和 2.8 个百分点的巨大优势超越了其他参与者。源代码和训练好的模型可在 https://github.com/ufal/crac2024-corpipe 获取。