In this paper, we present a proposal for an unsupervised algorithm, P-Summ, that generates an extractive summary of scientific scholarly text to meet the personal knowledge needs of the user. The method delves into the latent semantic space of the document exposed by Weighted Non-negative Matrix Factorization, and scores sentences in consonance with the knowledge needs of the user. The novelty of the algorithm lies in its ability to include desired knowledge and eliminate unwanted knowledge in the personal summary. We also propose a multi-granular evaluation framework, which assesses the quality of generated personal summaries at three levels of granularity - sentence, terms and semantic. The framework uses system generated generic summary instead of human generated summary as gold standard for evaluating the quality of personal summary generated by the algorithm. The effectiveness of the algorithm at the semantic level is evaluated by taking into account the reference summary and the knowledge signals. We evaluate the performance of P-Summ algorithm over four data-sets consisting of scientific articles. Our empirical investigations reveal that the proposed method has the capability to meet negative (or positive) knowledge preferences of the user.
翻译:本文提出了一种无监督算法P-Summ,用于生成科学学术文本的抽取式摘要,以满足用户的个性化知识需求。该方法通过加权非负矩阵分解(Weighted Non-negative Matrix Factorization)揭示文档的潜在语义空间,并依据用户的知识需求对句子进行评分。该算法的创新之处在于能够将期望知识纳入个性化摘要,同时排除不相关知识。我们还提出了一种多粒度评估框架,从句子、术语和语义三个粒度层级评估生成的个性化摘要质量。该框架采用系统生成的通用摘要而非人工摘要作为评估算法生成个性化摘要质量的黄金标准。在语义层级的有效性评估中,我们综合考虑了参考摘要与知识信号。我们在四个由科学文献构成的数据集上评估了P-Summ算法的性能。实证研究表明,所提方法能够满足用户的消极(或积极)知识偏好。