Software development has become essential to scientific research, but its relationship to traditional metrics of scholarly credit remains poorly understood. We develop a dataset of approximately 140,000 paired research articles and code repositories, and a predictive model that matches research article authors with software repository developer accounts. We use this dataset to investigate how software development activities influence credit allocation in collaborative scientific settings. Our findings reveal significant patterns distinguishing software contributions from traditional authorship credit. We find that $\sim$30\% of articles include non-author code contributors -- individuals who participated in software development but received no authorship recognition. While code-contributing authors provide a $\sim$4.2\% increase in article citations, this effect becomes non-significant when controlling for domain, article type, and open access status. First authors are significantly more likely to be code contributors than other author positions. Notably, we identify a negative relationship between coding frequency and scholarly impact metrics. Authors who contribute code more frequently exhibit progressively lower h-indices than non-coding colleagues, even when controlling for publication count, author position, domain, and article type. These results suggest a disconnect between software contributions and credit, highlighting important implications for institutional reward structures and science policy.
翻译:软件开发已成为科学研究的重要组成部分,但其与学术认可传统计量指标之间的关系仍不甚明晰。本研究构建了一个包含约14万篇研究论文与对应代码仓库的配对数据集,并开发了能够匹配论文作者与软件仓库开发者账户的预测模型。基于该数据集,我们深入探究了软件开发活动如何影响合作科研环境中的学术认可分配机制。研究发现,软件贡献与传统作者署名权之间存在显著差异模式:约30%的论文包含未获署名的代码贡献者——即参与了软件开发但未获得作者身份认可的研究人员。虽然代码贡献作者能使论文引用量提升约4.2%,但在控制学科领域、文章类型和开放获取状态后,该效应不再显著。第一作者成为代码贡献者的概率显著高于其他作者位次。值得注意的是,我们发现了编码频率与学术影响力指标之间的负相关关系:相较于非编码合作者,频繁贡献代码的作者其h指数呈现系统性偏低趋势,即使在控制发文量、作者位次、学科领域和文章类型后该现象依然存在。这些结果表明软件贡献与学术认可体系之间存在脱节现象,对机构奖励机制与科学政策制定具有重要启示意义。