Initiated by the University Consortium of Geographic Information Science (UCGIS), GIS&T Body of Knowledge (BoK) is a community-driven endeavor to define, develop, and document geospatial topics related to geographic information science and technologies (GIS&T). In recent years, GIS&T BoK has undergone rigorous development in terms of its topic re-organization and content updating, resulting in a new digital version of the project. While the BoK topics provide useful materials for researchers and students to learn about GIS, the semantic relationships among the topics, such as semantic similarity, should also be identified so that a better and automated topic navigation can be achieved. Currently, the related topics are either defined manually by editors or authors, which may result in an incomplete assessment of topic relationship. To address this challenge, our research evaluates the effectiveness of multiple natural language processing (NLP) techniques in extracting semantics from text, including both deep neural networks and traditional machine learning approaches. Besides, a novel text summarization - KACERS (Keyword-Aware Cross-Encoder-Ranking Summarizer) - is proposed to generate a semantic summary of scientific publications. By identifying the semantic linkages among key topics, this work provides guidance for future development and content organization of the GIS&T BoK project. It also offers a new perspective on the use of machine learning techniques for analyzing scientific publications, and demonstrate the potential of KACERS summarizer in semantic understanding of long text documents.
翻译:由美国地理信息科学大学联盟(UCGIS)发起的GIS&T知识体系(BoK)是一项社区驱动的工作,旨在定义、开发并记录与地理信息科学与技术(GIS&T)相关的地球空间主题。近年来,GIS&T BoK在主题重组和内容更新方面经历了严格开发,形成了该项目的新数字版本。尽管BoK主题为研究人员和学生了解GIS提供了有用材料,但主题间的语义关系(例如语义相似度)也应被识别,以实现更优的自动化主题导航。目前,相关主题要么由编辑手动定义,要么由作者自行制定,这可能导致主题关系评估不完整。为应对这一挑战,本研究评估了多种自然语言处理(NLP)技术从文本中提取语义的有效性,包括深度神经网络和传统机器学习方法。此外,提出了一种新型文本摘要方法——KACERS(关键词感知交叉编码器排序摘要器),用于生成科学出版物的语义摘要。通过识别关键主题间的语义关联,本研究为GIS&T BoK项目的未来发展和内容组织提供了指导,也为利用机器学习技术分析科学出版物提供了新视角,并展示了KACERS摘要器在长文本语义理解方面的潜力。