In recent years, text summarization methods have attracted much attention again thanks to the researches on neural network models. Most of the current text summarization methods based on neural network models are supervised methods which need large-scale datasets. However, large-scale datasets are difficult to obtain in practical applications. In this paper, we model the task of extractive text summarization methods from the perspective of Information Theory, and then describe the unsupervised extractive methods with a uniform framework. To improve the feature distribution and to decrease the mutual information of summarization sentences, we propose a new sentence extraction strategy which can be applied to existing unsupervised extractive methods. Experiments are carried out on different datasets, and results show that our strategy is indeed effective and in line with expectations.
翻译:近年来,得益于神经网络模型的研究进展,文本摘要方法再次受到广泛关注。当前基于神经网络模型的文本摘要方法大多为监督式方法,需要大规模数据集支撑。然而在实际应用中,大规模数据集往往难以获取。本文从信息论视角对抽取式文本摘要任务进行建模,并利用统一框架描述无监督抽取式方法。为改善特征分布并降低摘要句子的互信息,我们提出了一种可应用于现有无监督抽取式方法的新句子提取策略。在多个数据集上的实验结果表明,该策略确实有效且符合预期。