Coreference resolution (CR), identifying expressions referring to the same real-world entity, is a fundamental challenge in natural language processing (NLP). This paper explores the latest advancements in CR, spanning coreference and anaphora resolution. We critically analyze the diverse corpora that have fueled CR research, highlighting their strengths, limitations, and suitability for various tasks. We examine the spectrum of evaluation metrics used to assess CR systems, emphasizing their advantages, disadvantages, and the need for more nuanced, task-specific metrics. Tracing the evolution of CR algorithms, we provide a detailed overview of methodologies, from rule-based approaches to cutting-edge deep learning architectures. We delve into mention-pair, entity-based, cluster-ranking, sequence-to-sequence, and graph neural network models, elucidating their theoretical foundations and performance on benchmark datasets. Recognizing the unique challenges of Persian CR, we dedicate a focused analysis to this under-resourced language. We examine existing Persian CR systems and highlight the emergence of end-to-end neural models leveraging pre-trained language models like ParsBERT. This review is an essential resource for researchers and practitioners, offering a comprehensive overview of the current state-of-the-art in CR, identifying key challenges, and charting a course for future research in this rapidly evolving field.
翻译:共指消解(CR)作为识别指向同一现实世界实体的表达式的任务,是自然语言处理(NLP)中的一个基础性挑战。本文探讨了共指消解领域的最新进展,涵盖共指与回指消解。我们批判性地分析了推动CR研究发展的各类语料库,重点阐述了其优势、局限性及对不同任务的适用性。我们审视了用于评估CR系统的多种评价指标,强调其优缺点,并指出需要更具针对性、任务特定的评估指标。通过追溯CR算法的演进历程,我们对从基于规则的方法到前沿深度学习架构的方法论进行了详细概述。我们深入探讨了提及对模型、基于实体的模型、聚类排序模型、序列到序列模型以及图神经网络模型,阐明了它们的理论基础及其在基准数据集上的性能表现。认识到波斯语共指消解面临的独特挑战,我们针对这一资源匮乏的语言进行了专题分析。我们考察了现有的波斯语CR系统,并重点介绍了利用ParsBERT等预训练语言模型的端到端神经模型的兴起。本综述为研究人员和实践者提供了重要参考,全面概述了CR领域当前的最新进展,指出了关键挑战,并为这一快速发展领域的未来研究方向描绘了路线图。