We present CorPipe, the winning entry to the CRAC 2023 Shared Task on Multilingual Coreference Resolution. Our system is an improved version of our earlier multilingual coreference pipeline, and it surpasses other participants by a large margin of 4.5 percent points. CorPipe first performs mention detection, followed by coreference linking via an antecedent-maximization approach on the retrieved spans. Both tasks are trained jointly on all available corpora using a shared pretrained language model. Our main improvements comprise inputs larger than 512 subwords and changing the mention decoding to support ensembling. The source code is available at https://github.com/ufal/crac2023-corpipe.
翻译:摘要:我们提出了CorPipe,该系统是CRAC 2023多语言指代消解共享任务中的优胜方案。我们的系统基于先前提出的多语言指代消解管线进行了改进,以4.5个百分点的显著优势超越其他参赛者。CorPipe首先执行提及检测,随后通过前驱项最大化方法对检索到的跨度进行指代链接。两个任务均在所有可用语料库上联合训练,并共享同一个预训练语言模型。我们的主要改进包括:输入跨度超过512个子词,以及修改提及解码以支持集成学习。源代码已开源至https://github.com/ufal/crac2023-corpipe。