Nowadays many research articles are prefaced with research highlights to summarize the main findings of the paper. Highlights not only help researchers precisely and quickly identify the contributions of a paper, they also enhance the discoverability of the article via search engines. We aim to automatically construct research highlights given certain segments of a research paper. We use a pointer-generator network with coverage mechanism and a contextual embedding layer at the input that encodes the input tokens into SciBERT embeddings. We test our model on a benchmark dataset, CSPubSum, and also present MixSub, a new multi-disciplinary corpus of papers for automatic research highlight generation. For both CSPubSum and MixSub, we have observed that the proposed model achieves the best performance compared to related variants and other models proposed in the literature. On the CSPubSum dataset, our model achieves the best performance when the input is only the abstract of a paper as opposed to other segments of the paper. It produces ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 38.26, 14.26 and 35.51, respectively, METEOR score of 32.62, and BERTScore F1 of 86.65 which outperform all other baselines. On the new MixSub dataset, where only the abstract is the input, our proposed model (when trained on the whole training corpus without distinguishing between the subject categories) achieves ROUGE-1, ROUGE-2 and ROUGE-L F1-scores of 31.78, 9.76 and 29.3, respectively, METEOR score of 24.00, and BERTScore F1 of 85.25.
翻译:如今,许多研究文章配有研究亮点,用以总结论文的主要发现。亮点不仅能帮助研究者精准快速地识别论文贡献,还能通过搜索引擎提升文章的可发现性。本文旨在针对研究论文的特定段落自动构建研究亮点。我们采用一种包含覆盖机制的指针生成网络,并在输入层引入上下文嵌入,将输入令牌编码为SciBERT嵌入。我们在基准数据集CSPubSum上测试模型,并提出了MixSub——一个用于自动研究亮点生成的新型多学科论文语料库。对于CSPubSum和MixSub两个数据集,我们观察到所提模型相比相关变体及文献中提出的其他模型均取得了最佳性能。在CSPubSum数据集上,当输入仅为论文摘要而非其他段落时,模型实现了最优表现。其ROUGE-1、ROUGE-2和ROUGE-L的F1分数分别为38.26、14.26和35.51,METEOR得分为32.62,BERTScore F1为86.65,均优于所有其他基线。在全新MixSub数据集上,仅以摘要作为输入时,我们的模型(在未区分学科类别的完整训练语料上训练)取得了ROUGE-1、ROUGE-2和ROUGE-L的F1分数分别为31.78、9.76和29.3,METEOR得分为24.00,BERTScore F1为85.25。