SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation

Elizabeth Clark,Shruti Rijhwani,Sebastian Gehrmann,Joshua Maynez,Roee Aharoni,Vitaly Nikolaev,Thibault Sellam,Aditya Siddhant,Dipanjan Das,Ankur P. Parikh

Reliable automatic evaluation of summarization systems is challenging due to the multifaceted and subjective nature of the task. This is especially the case for languages other than English, where human evaluations are scarce. In this work, we introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation. SEAHORSE consists of 96K summaries with human ratings along 6 quality dimensions: comprehensibility, repetition, grammar, attribution, main ideas, and conciseness, covering 6 languages, 9 systems and 4 datasets. As a result of its size and scope, SEAHORSE can serve both as a benchmark to evaluate learnt metrics, as well as a large-scale resource for training such metrics. We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE (Honovich et al., 2022) and mFACE (Aharoni et al., 2022). We make SEAHORSE publicly available for future research on multilingual and multifaceted summarization evaluation.

翻译：可靠的自动摘要评估系统因任务的多元性和主观性而面临挑战，尤其是在英语以外的语言中，人工评估资源匮乏。本研究提出SEAHORSE——一个面向多语言、多维度摘要评估的数据集。该数据集包含96,000条摘要及其人工评分，覆盖6个质量维度：可理解性、重复性、语法准确性、归因性、主旨表达和简洁性，涵盖6种语言、9个生成系统和4个基准数据集。凭借其规模与覆盖范围，SEAHORSE既可作为评估学习型指标的基准，也可作为训练此类指标的大规模资源。实验表明，基于SEAHORSE训练的指标在域外元评估基准TRUE（Honovich等，2022）和mFACE（Aharoni等，2022）上取得了优异性能。我们已公开SEAHORSE数据集，供未来多语言、多维度摘要评估研究使用。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日