Obtaining valuable information from massive data efficiently has become our research goal in the era of Big Data. Text summarization technology has been continuously developed to meet this demand. Recent work has also shown that transformer-based pre-trained language models have achieved great success on various tasks in Natural Language Processing (NLP). Aiming at the problem of Chinese news text summary generation and the application of Transformer structure on Chinese, this paper proposes a Chinese news text summarization model (CNsum) based on Transformer structure, and tests it on Chinese datasets such as THUCNews. The results of the conducted experiments show that CNsum achieves better ROUGE score than the baseline models, which verifies the outperformance of the model.
翻译:在大数据时代,如何从海量数据中高效获取有价值信息已成为我们的研究目标。为满足这一需求,文本摘要技术持续发展。近期研究亦表明,基于Transformer的预训练语言模型在自然语言处理(NLP)各项任务中取得了显著成功。针对中文新闻文本摘要生成问题及Transformer结构在中文任务中的应用,本文提出了一种基于Transformer结构的中文新闻文本摘要模型(CNsum),并在THUCNews等中文数据集上进行了测试。实验结果表明,CNsum在ROUGE分数上优于基线模型,验证了该模型的优越性。