We describe the winning submission to the CRAC 2022 Shared Task on Multilingual Coreference Resolution. Our system first solves mention detection and then coreference linking on the retrieved spans with an antecedent-maximization approach, and both tasks are fine-tuned jointly with shared Transformer weights. We report results of fine-tuning a wide range of pretrained models. The center of this contribution are fine-tuned multilingual models. We found one large multilingual model with sufficiently large encoder to increase performance on all datasets across the board, with the benefit not limited only to the underrepresented languages or groups of typologically relative languages. The source code is available at https://github.com/ufal/crac2022-corpipe.
翻译:本文描述了我们在CRAC 2022多语言指代消解共享任务中的优胜方案。该系统首先解决提及检测问题,随后通过先行词最大化方法对检索到的跨度进行指代链接,两项任务通过共享Transformer权重联合微调。我们报告了多种预训练模型的微调结果,其中多语言模型的微调是本项工作的核心。研究发现,一个编码器足够大的多语言模型能够全面提升所有数据集上的性能,其优势不仅限于低资源语言或语系相近的语言群组。源代码已发布于https://github.com/ufal/crac2022-corpipe。