Software is a central part of modern science, and knowledge of its use is crucial for the scientific community with respect to reproducibility and attribution of its developers. Several studies have investigated in-text mentions of software and its quality, while the quality of formal software citations has only been analyzed superficially. This study performs an in-depth evaluation of formal software citation based on a set of manually annotated software references. It examines which resources are cited for software usage, to what extend they allow proper identification of software and its specific version, how this information is made available by scientific publishers, and how well it is represented in large-scale bibliographic databases. The results show that software articles are the most cited resource for software, while direct software citations are better suited for identification of software versions. Moreover, we found current practices by both, publishers and bibliographic databases, to be unsuited to represent these direct software citations, hindering large-scale analyses such as assessing software impact. We argue that current practices for representing software citations -- the recommended way to cite software by current citation standards -- stand in the way of their adaption by the scientific community, and urge providers of bibliographic data to explicitly model scientific software.
翻译:软件是现代科学的核心组成部分,了解其使用情况对于科学界的可重复性及开发者归属认定至关重要。已有研究探讨了文本中软件提及及其质量,但对正式软件引用质量的剖析尚停留在表层。本研究基于人工标注的软件参考文献集,对正式软件引用进行了深度评估,考察了软件使用行为中引用的资源类型、资源对软件本体及特定版本的标识能力、出版商提供此类信息的方式,以及这些信息在大规模书目数据库中的呈现质量。结果显示:软件论文是软件引用的主要资源类型,而直接软件引用更适合识别软件版本。此外,我们发现出版商和书目数据库的现行做法均不适用于呈现直接软件引用,这阻碍了软件影响力评估等大规模分析。研究认为,当前表征软件引用的实践——即现行引用标准所推荐的软件引用方式——反而阻碍了科学界对其的采纳采纳,我们敦促书目数据提供方对科学软件进行显式建模。