Funding agencies are largely relied on a topic matching between domain experts and research proposals to assign proposal reviewers. As proposals are increasingly interdisciplinary, it is challenging to profile the interdisciplinary nature of a proposal, and, thereafter, find expert reviewers with an appropriate set of expertise. An essential step in solving this challenge is to accurately model and classify the interdisciplinary labels of a proposal. Existing methodological and application-related literature, such as textual classification and proposal classification, are insufficient in jointly addressing the three key unique issues introduced by interdisciplinary proposal data: 1) the hierarchical structure of discipline labels of a proposal from coarse-grain to fine-grain, e.g., from information science to AI to fundamentals of AI. 2) the heterogeneous semantics of various main textual parts that play different roles in a proposal; 3) the number of proposals is imbalanced between non-interdisciplinary and interdisciplinary research. Can we simultaneously address the three issues in understanding the proposal's interdisciplinary nature? In response to this question, we propose a hierarchical mixup multiple-label classification framework, which we called H-MixUp. H-MixUp leverages a transformer-based semantic information extractor and a GCN-based interdisciplinary knowledge extractor for the first and second issues. H-MixUp develops a fused training method of Wold-level MixUp, Word-level CutMix, Manifold MixUp, and Document-level MixUp to address the third issue.
翻译:资助机构主要依赖领域专家与研究提案之间的主题匹配来分配审稿人。随着研究提案日益呈现跨学科特征,描述提案的跨学科性质并据此寻找具备相应专业知识的专家评审员成为挑战。解决这一问题的关键步骤是准确建模并分类提案的跨学科标签。现有方法与应用相关文献(如文本分类与提案分类)难以共同应对跨学科提案数据引入的三个关键独特问题:1)学科标签从粗粒度到细粒度的层级结构(例如:从信息科学→人工智能→人工智能基础);2)提案中承担不同角色的多个主要文本部分的异质语义;3)非交叉学科研究与交叉学科研究提案数量的不均衡性。能否同时解决理解提案跨学科性质时的这三个问题?针对此问题,我们提出名为H-MixUp的层级混合多标签分类框架。H-MixUp利用基于Transformer的语义信息提取器和基于GCN的跨学科知识提取器解决前两个问题,并开发融合Word-level MixUp、Word-level CutMix、Manifold MixUp和Document-level MixUp的训练方法处理第三个问题。