We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.
翻译:我们提出HyperSum——一种兼顾传统词汇摘要效率与当代神经方法准确性的抽取式摘要框架。HyperSum利用随机初始化超高维向量时产生的伪正交性("维度优势"),构建兼具代表性和高效性的句子嵌入向量。通过对所获嵌入向量进行简单聚类并提取其中心点,即可生成具有竞争力的摘要。实验表明,HyperSum在摘要准确性和忠实度方面通常优于当前最先进的摘要模型,同时运行速度快10至100倍。我们已将HyperSum作为无监督抽取式摘要的强基线模型开源。