Automatic keyword extraction from academic papers is a key area of interest in natural language processing and information retrieval. Although previous research has mainly focused on utilizing abstract and references for keyword extraction, this paper focuses on the highlights section - a summary describing the key findings and contributions, offering readers a quick overview of the research. Our observations indicate that highlights contain valuable keyword information that can effectively complement the abstract. To investigate the impact of incorporating highlights into unsupervised keyword extraction, we evaluate three input scenarios: using only the abstract, the highlights, and a combination of both. Experiments conducted with four unsupervised models on Computer Science (CS), Library and Information Science (LIS) datasets reveal that integrating the abstract with highlights significantly improves extraction performance. Furthermore, we examine the differences in keyword coverage and content between abstract and highlights, exploring how these variations influence extraction outcomes. The data and code are available at https://github.com/xiangyi-njust/Highlight-KPE.
翻译:自动从学术论文中提取关键词是自然语言处理和信息检索领域的一个关键研究方向。虽然以往研究主要利用摘要和参考文献进行关键词提取,但本文聚焦于"亮点"部分——一种描述关键发现和贡献的总结,为读者提供研究的快速概览。我们的观察表明,亮点包含有价值的关键词信息,可以有效补充摘要。为研究将亮点融入无监督关键词提取的影响,我们评估了三种输入场景:仅使用摘要、仅使用亮点以及两者结合。在计算机科学(CS)与图书情报学(LIS)数据集上使用四种无监督模型进行的实验表明,将摘要与亮点结合能显著提升提取性能。此外,我们探究了摘要与亮点在关键词覆盖范围和内容上的差异,探讨这些差异如何影响提取结果。数据和代码已开源在 https://github.com/xiangyi-njust/Highlight-KPE。