We present GENTLE, a new mixed-genre English challenge corpus totaling 17K tokens and consisting of 8 unusual text types for out-of domain evaluation: dictionary entries, esports commentaries, legal documents, medical notes, poetry, mathematical proofs, syllabuses, and threat letters. GENTLE is manually annotated for a variety of popular NLP tasks, including syntactic dependency parsing, entity recognition, coreference resolution, and discourse parsing. We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.
翻译:我们提出GENTLE,一个新型混合体裁英语挑战语料库,总计包含17K词元,由8种非典型文本类型构成,用于域外评估:词典条目、电竞解说、法律文件、医疗记录、诗歌、数学证明、教学大纲及恐吓信函。该数据集针对多种主流NLP任务进行了人工标注,包括句法依存分析、实体识别、共指消解和篇章分析。我们使用GENTLE评估了当前最先进的NLP系统,发现其在所有任务中至少在某些体裁上表现出显著性能下降,这证明了GENTLE作为NLP系统评估数据集的有效性。