Software is a central part of modern science, and knowledge of its use is crucial for the scientific community with respect to reproducibility and attribution of its developers. Several studies have investigated in-text mentions of software and its quality, while the quality of formal software citations has only been analyzed superficially. This study performs an in-depth evaluation of formal software citation based on a set of manually annotated software references. It examines which resources are cited for software usage, to what extend they allow proper identification of software and its specific version, how this information is made available by scientific publishers, and how well it is represented in large-scale bibliographic databases. The results show that software articles are the most cited resource for software, while direct software citations are better suited for identification of software versions. Moreover, we found current practices by both, publishers and bibliographic databases, to be unsuited to represent these direct software citations, hindering large-scale analyses such as assessing software impact. We argue that current practices for representing software citations -- the recommended way to cite software by current citation standards -- stand in the way of their adaption by the scientific community, and urge providers of bibliographic data to explicitly model scientific software.
翻译:软件是现代科学的核心组成部分,其使用情况的知识对于科学界的可重复性和开发者归属至关重要。已有研究考察了文本中对软件的提及及其质量,但正式软件引用的质量仅被浅层分析过。本研究基于一组人工标注的软件引用,对正式软件引用进行了深度评估,探讨了哪些资源被引用用于软件使用、它们能在多大程度上实现软件及其特定版本的准确识别、这些信息如何由科学出版商提供,以及它们在大型书目数据库中的表征程度。结果表明,软件文章是被引用最多的软件资源,而直接软件引用更适合识别软件版本。此外,我们发现出版商和书目数据库的当前做法均不适合表征这些直接软件引用,这阻碍了诸如评估软件影响力等大规模分析。我们认为,当前表征软件引用的做法——当前引用标准推荐的软件引用方式——阻碍了科学界对其采纳,并敦促书目数据提供者明确建模科学软件。