Large autoregressive generative models have emerged as the cornerstone for achieving the highest performance across several Natural Language Processing tasks. However, the urge to attain superior results has, at times, led to the premature replacement of carefully designed task-specific approaches without exhaustive experimentation. The Coreference Resolution task is no exception; all recent state-of-the-art solutions adopt large generative autoregressive models that outperform encoder-based discriminative systems. In this work,we challenge this recent trend by introducing Maverick, a carefully designed - yet simple - pipeline, which enables running a state-of-the-art Coreference Resolution system within the constraints of an academic budget, outperforming models with up to 13 billion parameters with as few as 500 million parameters. Maverick achieves state-of-the-art performance on the CoNLL-2012 benchmark, training with up to 0.006x the memory resources and obtaining a 170x faster inference compared to previous state-of-the-art systems. We extensively validate the robustness of the Maverick framework with an array of diverse experiments, reporting improvements over prior systems in data-scarce, long-document, and out-of-domain settings. We release our code and models for research purposes at https://github.com/SapienzaNLP/maverick-coref.
翻译:大型自回归生成模型已成为在多个自然语言处理任务中实现最高性能的基石。然而,追求卓越结果的冲动有时导致未经详尽实验便过早替换精心设计的任务特定方法。指代消解任务也不例外;所有近期的最先进解决方案均采用大型生成式自回归模型,其性能超越了基于编码器的判别式系统。在本工作中,我们通过引入Maverick挑战这一近期趋势,这是一个精心设计却简洁的流水线,它使得在学术预算的约束下运行最先进的指代消解系统成为可能,仅用5亿参数即超越了参数高达130亿的模型。Maverick在CoNLL-2012基准测试中实现了最先进的性能,其训练所需内存资源低至先前最先进系统的0.006倍,推理速度提升170倍。我们通过一系列多样化的实验广泛验证了Maverick框架的鲁棒性,报告了在数据稀缺、长文档及领域外设置中相较于先前系统的改进。我们已在https://github.com/SapienzaNLP/maverick-coref 发布代码与模型以供研究之用。