Summarization for scientific text has shown significant benefits both for the research community and human society. Given the fact that the nature of scientific text is distinctive and the input of the multi-document summarization task is substantially long, the task requires sufficient embedding generation and text truncation without losing important information. To tackle these issues, in this paper, we propose SKT5SciSumm - a hybrid framework for multi-document scientific summarization (MDSS). We leverage the Sentence-Transformer version of Scientific Paper Embeddings using Citation-Informed Transformers (SPECTER) to encode and represent textual sentences, allowing for efficient extractive summarization using k-means clustering. We employ the T5 family of models to generate abstractive summaries using extracted sentences. SKT5SciSumm achieves state-of-the-art performance on the Multi-XScience dataset. Through extensive experiments and evaluation, we showcase the benefits of our model by using less complicated models to achieve remarkable results, thereby highlighting its potential in advancing the field of multi-document summarization for scientific text.
翻译:科学文本摘要已为研究界和人类社会展现出显著益处。鉴于科学文本性质独特且多文档摘要任务输入篇幅较长,该任务需生成充分嵌入表示并进行文本截断,同时避免丢失重要信息。针对这些问题,本文提出SKT5SciSumm——一种多文档科学摘要(MDSS)混合框架。我们利用基于引用信息感知Transformer的科学论文嵌入(SPECTER)的Sentence-Transformer版本对文本句子进行编码与表示,通过k-means聚类实现高效抽取式摘要。同时采用T5系列模型对抽取句子生成生成式摘要。在Multi-XScience数据集上,SKT5SciSumm取得了最优性能。通过大量实验与评估,我们展示了该模型通过较简模型实现卓越成果的优势,从而凸显其在推动科学文本多文档摘要领域发展的潜力。