Citation analysis is used extensively in the bibliometrics literature to assess the impact of individual works, researchers, institutions, and even entire fields of study. In this paper, we analyze citations in one large and influential field within computer science, namely computer systems. Using citation data from a cross-sectional sample of 2,088 papers in 50 systems conferences from 2017, we examine four research questions: overall distribution of systems citations; their evolution over time; the differences between databases (Google Scholar and Scopus) for systems papers, and; the characteristics of self-citations in the field. We find that only 1.5% of papers remain uncited after five years, while 12.8% accrued at least 100 citations, both statistics comparing favorably to many other scientific fields. The most cited subfields and conference areas within systems were security, databases, and computer architecture. Most papers achieved their first citation within a year from publication, and the median citation count continued to grow at an almost linear rate over five years, with only a few papers peaking before that. We also find that early citations could be linked to papers with a freely available preprint, or may be primarily composed of self-citations. The ratio of self-citations to total citations starts relatively high for most papers but appears to stabilize by 12--18 months, at which point highly cited papers revert to predominately external citations. Past self-citation count (taken from each paper's reference list) appears to bear little if any relationship with the future self-citation count of each paper. The choice of citation database also makes little difference in relative citation comparisons, despite marked differences in absolute counts.
翻译:引文分析在文献计量学中被广泛用于评估单篇著作、研究人员、机构乃至整个研究领域的影响力。本文对计算机科学中一个庞大且具有影响力的领域——计算机系统——进行了引文分析。利用2017年50个系统会议中2088篇论文的横截面样本数据,我们探讨了四个研究问题:系统引文的整体分布、其随时间演变的规律、系统论文在谷歌学术与Scopus数据库中的差异,以及该领域自引的特征。研究发现,仅1.5%的论文在五年后未被引用,而12.8%的论文累计获得至少100次引用,这两项统计数据均优于许多其他科学领域。系统领域内被引最高的子领域和会议方向依次为安全、数据库和计算机体系结构。大多数论文在发表后一年内获得首次引用,且五年间中位引文数持续以近乎线性的速度增长,仅有少数论文在此之前达到峰值。我们还发现,早期引文可能与提供免费预印本的论文相关,或主要由自引构成。大多数论文的自引占总引用比例初始较高,但在12-18个月后趋于稳定,此时高被引论文的引用主要转向外部来源。过去自引次数(基于各论文参考文献列表)与未来自引次数之间几乎不存在关联。尽管不同数据库的绝对引用计数差异显著,但在相对引用比较中,数据库选择的影响微乎其微。