Findings of the Fifth Shared Task on Multilingual Coreference Resolution: Expanding Datasets for Long-Range Entities

This paper describes the fifth edition of the Shared Task on Multilingual Coreference Resolution, held in conjunction with the CODI-CRAC 2026 workshop. Building on previous iterations, the task required participants to develop systems capable of mention identification and identity-based coreference clustering. The 2026 edition specifically emphasizes long-range entities, defined as coreferential chains spanning significant distances, across many words and sentences. The task expanded its linguistic scope by incorporating five new datasets and two additional languages. These additions leverage version 1.4 of CorefUD, a harmonized multilingual collection comprising 27 datasets in 19 languages. In total, ten systems participated, including four LLM-based approaches (three fine-tuned models and one few-shot approach). While traditional systems still maintained their lead, LLMs demonstrated significant potential, suggesting they may soon challenge established approaches in future editions.

翻译：本文介绍了第五届多语种共指消解共享任务，该任务与CODI-CRAC 2026研讨会同期举行。在前几届任务的基础上，本任务要求参与者开发能够进行指称识别和基于同一性的共指聚类的系统。2026年版本特别强调长距离实体，即跨越显著距离、涵盖多个词语和句子的共指链。该任务通过纳入五个新数据集和两种额外语言扩展了其语言范围。这些新增内容利用了CorefUD 1.4版本，这是一个包含19种语言27个数据集的统一多语种集合。共有十个系统参与，包括四种基于LLM的方法（三种微调模型和一种少样本方法）。尽管传统系统仍保持领先地位，但LLM显示出巨大潜力，表明它们可能很快在未来的版本中挑战既有的方法。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

多媒体顶会ACM Multimedia 2025各大奖项揭晓！格拉斯哥大学等获最佳论文，中科院自动化所等获最佳学生论文

专知会员服务

10+阅读 · 2025年11月1日

《分布式多域协同作战中的互依性任务管理界面研究》最新报告

专知会员服务

56+阅读 · 2025年5月28日

《多域作战要素：跨军事、工业、政府与学术领域先决条件再明确》最新报告

专知会员服务

26+阅读 · 2025年5月4日

《基于多智能体强化学习的异构平台数据驱动分布式共同作战图景》

专知会员服务

91+阅读 · 2024年12月2日