Opinion summarization is the task of creating summaries capturing popular opinions from user reviews. In this paper, we introduce Geodesic Summarizer (GeoSumm), a novel system to perform unsupervised extractive opinion summarization. GeoSumm involves an encoder-decoder based representation learning model, that generates representations of text as a distribution over latent semantic units. GeoSumm generates these representations by performing dictionary learning over pre-trained text representations at multiple decoder layers. We then use these representations to quantify the relevance of review sentences using a novel approximate geodesic distance based scoring mechanism. We use the relevance scores to identify popular opinions in order to compose general and aspect-specific summaries. Our proposed model, GeoSumm, achieves state-of-the-art performance on three opinion summarization datasets. We perform additional experiments to analyze the functioning of our model and showcase the generalization ability of {\X} across different domains.
翻译:观点摘要是一项从用户评论中捕捉主流意见并生成摘要的任务。本文提出了一种名为Geodesic Summarizer(GeoSumm)的新型系统,用于执行无监督抽取式观点摘要。GeoSumm采用基于编码器-解码器的表示学习模型,将文本表示生成为潜在语义单元上的分布。该模型通过在多个解码器层对预训练文本表示进行字典学习来生成这些表示。随后,我们利用这些表示,基于一种新颖的近似测地线距离评分机制,量化评论句子的相关性。通过相关性分数识别主流意见,从而构建通用摘要与特定方面摘要。所提出的GeoSumm模型在三个观点摘要数据集上达到了最先进的性能。我们通过额外实验分析模型运行机制,并展示了{\X}在不同领域的泛化能力。