State-of-the-art coreference resolutions systems depend on multiple LLM calls per document and are thus prohibitively expensive for many use cases (e.g., information extraction with large corpora). The leading word-level coreference system (WL-coref) attains 96.6% of these SOTA systems' performance while being much more efficient. In this work, we identify a routine yet important failure case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We offer a simple yet effective solution that improves the performance on the OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level coreference resolution and expensive SOTA approaches by 34.6%. Our Conjunction-Aware Word-level coreference model (CAW-coref) and code is available at https://github.com/KarelDO/wl-coref.
翻译:最先进的共指消解系统依赖每篇文档多次调用大语言模型(LLM),因此在许多应用场景(如大规模语料的信息抽取)中成本过高。领先的词级共指消解系统WL-coref在保持更高效率的同时,达到了这些SOTA系统性能的96.6%。本文发现WL-coref存在一个常规但重要的失败案例:处理诸如“Tom and Mary”的连词提及。我们提出一种简单而有效的解决方案,在OntoNotes测试集上将F1值提升0.9%,将高效词级共指消解与昂贵的SOTA方法之间的差距缩小34.6%。我们的连词感知词级共指消解模型CAW-coref及代码可从https://github.com/KarelDO/wl-coref获取。