The number of published research papers has experienced exponential growth in recent years, which makes it crucial to develop new methods for efficient and versatile information extraction and knowledge discovery. To address this need, we propose a Semantic Knowledge Graph (SKG) that integrates semantic concepts from abstracts and other meta-information to represent the corpus. The SKG can support various semantic queries in academic literature thanks to the high diversity and rich information content stored within. To extract knowledge from unstructured text, we develop a Knowledge Extraction Module that includes a semi-supervised pipeline for entity extraction and entity normalization. We also create an ontology to integrate the concepts with other meta information, enabling us to build the SKG. Furthermore, we design and develop a dataflow system that demonstrates how to conduct various semantic queries flexibly and interactively over the SKG. To demonstrate the effectiveness of our approach, we conduct the research based on the visualization literature and provide real-world use cases to show the usefulness of the SKG. The dataset and codes for this work are available at https://osf.io/aqv8p/?view_only=2c26b36e3e3941ce999df47e4616207f.
翻译:近年来,发表的研究论文数量呈指数级增长,这使得开发高效、通用的信息提取和知识发现方法变得至关重要。为此,我们提出了一种语义知识图(Semantic Knowledge Graph, SKG),该图整合了摘要中的语义概念及其他元信息来表征语料库。由于SKG内部存储的信息高度多样且丰富,它能够支持学术文献中的各种语义查询。为了从非结构化文本中提取知识,我们开发了一个知识提取模块,其中包含用于实体提取和实体规范化的半监督流水线。我们还创建了一个本体,将概念与其他元信息整合,从而构建SKG。此外,我们设计并开发了一个数据流系统,该系统展示了如何灵活且交互式地在SKG上执行各种语义查询。为了证明我们方法的有效性,我们基于可视化文献进行了研究,并提供了真实世界的用例来展示SKG的实用性。本研究的数据集和代码可在https://osf.io/aqv8p/?view_only=2c26b36e3e3941ce999df47e4616207f获取。