Opinion summarization is expected to digest larger review sets and provide summaries from different perspectives. However, most existing solutions are deficient in epitomizing extensive reviews and offering opinion summaries from various angles due to the lack of designs for information selection. To this end, we propose SUBSUMM, a supervised summarization framework for large-scale multi-perspective opinion summarization. SUBSUMM consists of a review sampling strategy set and a two-stage training scheme. The sampling strategies take sentiment orientation and contrastive information value into consideration, with which the review subsets from different perspectives and quality levels can be selected. Subsequently, the summarizer is encouraged to learn from the sub-optimal and optimal subsets successively in order to capitalize on the massive input. Experimental results on AmaSum and Rotten Tomatoes datasets demonstrate that SUBSUMM is adept at generating pros, cons, and verdict summaries from hundreds of input reviews. Furthermore, our in-depth analysis verifies that the advanced selection of review subsets and the two-stage training scheme are vital to boosting the summarization performance.
翻译:观点摘要旨在概括更大规模的评论集,并从不同视角提供摘要。然而,现有大多数解决方案由于缺乏信息选择的设计,在提炼大量评论和提供多角度观点摘要方面存在不足。为此,我们提出SUBSUMM,一个用于大规模多视角观点摘要的有监督摘要框架。SUBSUMM包含一组评论采样策略和一个两阶段训练方案。采样策略综合考虑情感倾向和对比信息价值,从而能够选取来自不同视角和质量水平的评论子集。随后,摘要器被鼓励依次从次优和最优子集中学习,以充分利用大规模输入。在AmaSum和Rotten Tomatoes数据集上的实验结果表明,SUBSUMM擅长从数百条输入评论中生成正面、负面及总结性摘要。此外,我们的深入分析验证了评论子集的先进选择与两阶段训练方案对于提升摘要性能至关重要。