Seq2seq coreference models have introduced a new paradigm for coreference resolution by learning to generate text corresponding to coreference labels, without requiring task-specific parameters. While these models achieve new state-of-the-art performance, they do so at the cost of flexibility and efficiency. In particular, they do not efficiently handle incremental settings such as dialogue, where text must processed sequentially. We propose a compressed representation in order to improve the efficiency of these methods in incremental settings. Our method works by extracting and re-organizing entity-level tokens, and discarding the majority of other input tokens. On OntoNotes, our best model achieves just 0.6 CoNLL F1 points below a full-prefix, incremental baseline while achieving a compression ratio of 1.8. On LitBank, where singleton mentions are annotated, it passes state-of-the-art performance. Our results indicate that discarding a wide portion of tokens in seq2seq resolvers is a feasible strategy for incremental coreference resolution.
翻译:序列到序列(seq2seq)指代消解模型通过学习生成与指代标签相对应的文本,无需特定任务参数,从而为指代消解引入了新范式。尽管这些模型实现了最新的最优性能,但其代价是牺牲了灵活性与效率。特别是在处理对话等需顺序处理文本的增量式场景时,其效率表现不佳。为提升此类方法在增量式场景中的效率,我们提出一种压缩表示方法。该方法通过提取并重组实体层面的标记,同时舍弃大部分其他输入标记来实现优化。在OntoNotes数据集上,我们最优模型仅比完整前缀增量基线低0.6个CoNLL F1分数,同时实现了1.8的压缩比。在标注了单例提及的LitBank数据集上,该模型超越了现有最优性能。实验结果表明,在序列到序列指代消解器中舍弃大量标记是增量式指代消解的一种可行策略。