Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers' scientific impact. In addition, identifying the influential factors to predict the scientific impact is helpful for researchers seeking solutions to improve it. This study investigates the effect of author, paper and venue-specific features on the future h-index. For this purpose, we used machine learning methods to predict the h-index and feature analysis techniques to advance the understanding of feature impact. Utilizing the bibliometric data in Scopus, we defined and extracted two main groups of features. The first relates to prior scientific impact, and we name it 'prior impact-based features' and includes the number of publications, received citations, and h-index. The second group is 'non-impact-based features' and contains the features related to author, co-authorship, paper, and venue characteristics. We explored their importance in predicting h-index for researchers in three different career phases. Also, we examine the temporal dimension of predicting performance for different feature categories to find out which features are more reliable for long- and short-term prediction. We referred to the gender of the authors to examine the role of this author's characteristics in the prediction task. Our findings showed that gender has a very slight effect in predicting the h-index. We found that non-impact-based features are more robust predictors for younger scholars than seniors in the short term. Also, prior impact-based features lose their power to predict more than other features in the long-term.
翻译:研究人员产出评估对招聘委员会和资助机构至关重要,通常通过科研生产力、被引次数或h指数等综合指标衡量。由于获取被引次数和提高h指数需要一定时间,评估青年研究人员尤为关键。因此,预测h指数有助于揭示研究人员的科研影响力。此外,识别影响科研影响力预测的关键因素,有助于研究人员寻找提升影响力的解决方案。本研究考察了作者、论文及发表载体特定特征对未来h指数的影响。为此,我们采用机器学习方法预测h指数,并运用特征分析技术深化对特征影响机制的理解。基于Scopus文献计量数据,我们定义并提取了两类主要特征:第一类与既往科研影响力相关,命名为“基于先验影响力特征”,包括论文数量、被引次数和h指数;第二类为“非基于影响力特征”,涵盖作者特征、合著关系、论文特征及发表载体特征。我们探究了这些特征在预测不同职业阶段研究人员h指数时的重要性,并考察了不同特征类别对预测性能的时间维度影响,以确定哪些特征在长期和短期预测中更可靠。通过引入作者性别特征,我们检验了该作者属性在预测任务中的作用。研究结果表明,性别对h指数预测的影响微乎其微。在短期预测中,非基于影响力特征对青年研究人员的预测稳健性优于资深学者;而基于先验影响力特征在长期预测中的效能衰减程度高于其他特征。