SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation

Scientific news reports serve as a bridge, adeptly translating complex research articles into reports that resonate with the broader public. The automated generation of such narratives enhances the accessibility of scholarly insights. In this paper, we present a new corpus to facilitate this paradigm development. Our corpus comprises a parallel compilation of academic publications and their corresponding scientific news reports across nine disciplines. To demonstrate the utility and reliability of our dataset, we conduct an extensive analysis, highlighting the divergences in readability and brevity between scientific news narratives and academic manuscripts. We benchmark our dataset employing state-of-the-art text generation models. The evaluation process involves both automatic and human evaluation, which lays the groundwork for future explorations into the automated generation of scientific news reports. The dataset and code related to this work are available at https://dongqi.me/projects/SciNews.

翻译：摘要：科学新闻报道充当了桥梁，巧妙地将复杂的研究文章转化为能与普通公众产生共鸣的报道。此类叙事的自动生成提升了学术见解的可及性。本文提出了一个新语料库，以促进这一范式的发展。该语料库涵盖九个学科领域，包含学术出版物与其对应科学新闻报道的平行汇编。为验证数据集的有效性与可靠性，我们开展了深入分析，揭示了科学新闻叙事与学术手稿在可读性和简洁性上的差异。我们采用最先进的文本生成模型对数据集进行了基准测试，评估过程结合了自动评估与人工评估，为未来科学新闻报道自动生成的研究奠定了基础。本工作相关的数据集与代码可访问 https://dongqi.me/projects/SciNews 获取。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日