Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.
翻译:在数据驱动的研究范式下,研究软件在科学探索的几乎每个阶段都发挥着至关重要的作用。学者们倡导在学术出版物中正式引用软件,将其与传统研究成果等同对待。然而,软件的引用方式难以一致:同一软件实体可能被引用为不同对象,且引用格式会随时间变化。然而,现有关于软件引用的实证研究在很大程度上忽略了这些问题。为填补上述空白,本研究比较并分析了2021年和2022年收集的所有R包引用格式的纵向数据集,旨在理解R语言包(开源软件家族中的重要成员)的引用格式及其随时间演变的方式。具体而言,我们研究了引用背后的不同文档类型及其元数据要素随时间的变化。此外,我们还对被引用为软件的期刊论文(软件论文)的学科属性进行了深入分析。通过本研究,我们旨在促进对软件引用相关复杂性的深入理解,为未来的软件引用政策和基础设施提供启示。