The value of structured scholarly knowledge for research and society at large is well understood, but producing scholarly knowledge (i.e., knowledge traditionally published in articles) in structured form remains a challenge. We propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting knowledge graph includes articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. The knowledge graph also includes the results reported as scholarly knowledge in articles.
翻译:结构化学术知识对研究及社会整体的价值已获广泛认可,但以结构化形式生成学术知识(即传统发表于论文中的知识)仍是挑战。我们提出一种方法,通过静态分析已发布软件包的元数据及其内容(脚本与数据)来自动提取学术知识,并将提取的知识填充至学术知识图谱中。该方法基于挖掘与论文关联的科学软件包:通过提取元数据并分析源代码的抽象语法树(AST),获取关于数据使用、生成以及数据操作的信息。最终生成的知识图谱包含论文、软件包元数据、以及作为研究材料用于输入数据的计算技术。该知识图谱亦涵盖以学术知识形式在论文中报告的研究结果。