This paper describes the fifth edition of the Shared Task on Multilingual Coreference Resolution, held in conjunction with the CODI-CRAC 2026 workshop. Building on previous iterations, the task required participants to develop systems capable of mention identification and identity-based coreference clustering. The 2026 edition specifically emphasizes long-range entities, defined as coreferential chains spanning significant distances, across many words and sentences. The task expanded its linguistic scope by incorporating five new datasets and two additional languages. These additions leverage version 1.4 of CorefUD, a harmonized multilingual collection comprising 27 datasets in 19 languages. In total, ten systems participated, including four LLM-based approaches (three fine-tuned models and one few-shot approach). While traditional systems still maintained their lead, LLMs demonstrated significant potential, suggesting they may soon challenge established approaches in future editions.
翻译:本文介绍了第五届多语种共指消解共享任务,该任务与CODI-CRAC 2026研讨会同期举行。在前几届任务的基础上,本任务要求参与者开发能够进行指称识别和基于同一性的共指聚类的系统。2026年版本特别强调长距离实体,即跨越显著距离、涵盖多个词语和句子的共指链。该任务通过纳入五个新数据集和两种额外语言扩展了其语言范围。这些新增内容利用了CorefUD 1.4版本,这是一个包含19种语言27个数据集的统一多语种集合。共有十个系统参与,包括四种基于LLM的方法(三种微调模型和一种少样本方法)。尽管传统系统仍保持领先地位,但LLM显示出巨大潜力,表明它们可能很快在未来的版本中挑战既有的方法。