Hierarchical Classification of Research Fields in the "Web of Science" Using Deep Learning

The scholarly publication space is growing steadily not just in numbers but also in complexity due to collaboration between individuals from within and across fields of research. This paper presents a hierarchical classification system that automatically categorizes a scholarly publication using its abstract into a three-tier hierarchical label set of fields (discipline-field-subfield). This system enables a holistic view about the interdependence of research activities in the mentioned hierarchical tiers in terms of knowledge production through articles and impact through citations. The classification system (44 disciplines - 738 fields - 1,501 subfields) utilizes and is able to cope with 160 million abstract snippets in Microsoft Academic Graph (Version 2018-05-17) using batch training in a modularized and distributed fashion to address and assess interdisciplinarity and inter-field classifications. In addition, we have explored multi-class classifications in both the single-label and multi-label settings. In total, we have conducted 3,140 experiments, in all models (Convolutional Neural Networks, Recurrent Neural Networks, Transformers), the classification accuracy is > 90% in 77.84% and 78.83% of the single-label and multi-label classifications, respectively. We examine the advantages of our classification by its ability to better align research texts and output with disciplines, to adequately classify them in an automated way, as well as to capture the degree of interdisciplinarity in a publication which enables downstream analytics such as field interdisciplinarity. This system (a set of pretrained models) can serve as a backbone to an interactive system of indexing scientific publications.

翻译：学术出版物空间不仅在数量上稳步增长，而且因跨研究领域内外的研究者合作而日益复杂。本文提出了一种层次分类系统，该系统利用学术出版物的摘要，自动将其归类到一个三层层次标签集（学科-领域-子领域）中。该系统能够从知识生产（通过文章）和影响力（通过引用）的角度，全面审视上述层次级别中研究活动的相互依赖性。该分类系统（44个学科-738个领域-1,501个子领域）利用并能够处理微软学术图谱（2018-05-17版本）中的1.6亿条摘要片段，采用模块化与分布式批量训练方式，以识别和评估跨学科性与跨领域分类。此外，我们还在单标签和多标签设置下探索了多类分类。我们共进行了3,140次实验，在所有模型（卷积神经网络、循环神经网络、Transformer）中，单标签和多标签分类的准确率分别有77.84%和78.83%超过90%。我们通过本分类系统在更好对齐研究文本与学科、自动化充分分类，以及捕捉出版物跨学科程度方面的能力，验证了其优势，这为领域跨学科性等下游分析提供了支持。该系统（一组预训练模型）可作为科学出版物交互式索引系统的基础框架。

相关内容

Neural Networks

关注 1654

神经网络（Neural Networks）是世界上三个最古老的神经建模学会的档案期刊:国际神经网络学会(INNS)、欧洲神经网络学会(ENNS)和日本神经网络学会(JNNS)。神经网络提供了一个论坛，以发展和培育一个国际社会的学者和实践者感兴趣的所有方面的神经网络和相关方法的计算智能。神经网络欢迎高质量论文的提交，有助于全面的神经网络研究，从行为和大脑建模，学习算法，通过数学和计算分析，系统的工程和技术应用，大量使用神经网络的概念和技术。这一独特而广泛的范围促进了生物和技术研究之间的思想交流，并有助于促进对生物启发的计算智能感兴趣的跨学科社区的发展。因此，神经网络编委会代表的专家领域包括心理学，神经生物学，计算机科学，工程，数学，物理。该杂志发表文章、信件和评论以及给编辑的信件、社论、时事、软件调查和专利信息。文章发表在五个部分之一:认知科学，神经科学，学习系统，数学和计算分析、工程和应用。官网地址：http://dblp.uni-trier.de/db/journals/nn/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【论文】深度卷积神经网络的ImageNet分类（ImageNet Classification with Deep Convolutional Neural Networks）

专知会员服务

14+阅读 · 2020年1月1日