Citing papers is the primary method through which modern scientific writing discusses and builds on past work. Collectively, citing a diverse set of papers (in time and area of study) is an indicator of how widely the community is reading. Yet, there is little work looking at broad temporal patterns of citation. This work systematically and empirically examines: How far back in time do we tend to go to cite papers? How has that changed over time, and what factors correlate with this citational attention/amnesia? We chose NLP as our domain of interest and analyzed approximately 71.5K papers to show and quantify several key trends in citation. Notably, around 62% of cited papers are from the immediate five years prior to publication, whereas only about 17% are more than ten years old. Furthermore, we show that the median age and age diversity of cited papers were steadily increasing from 1990 to 2014, but since then, the trend has reversed, and current NLP papers have an all-time low temporal citation diversity. Finally, we show that unlike the 1990s, the highly cited papers in the last decade were also papers with the least citation diversity, likely contributing to the intense (and arguably harmful) recency focus. Code, data, and a demo are available on the project homepage.
翻译:引用论文是现代科学写作讨论和借鉴过去工作的主要方式。从整体上看,引用论文在时间维度和研究领域上的多样性,是衡量学界阅读广度的指标。然而,目前鲜有研究关注引用的广义时间模式。本研究系统性地通过实证考察:我们引用论文时通常回溯到多久以前?这种模式随时间如何变化?哪些因素与这种引用关注度/失忆现象相关?我们选择自然语言处理(NLP)作为研究领域,分析了约7.15万篇论文,展示并量化了引用的若干关键趋势。值得注意的是,约62%的引用论文来自发表前五年内,而仅有约17%的论文年龄超过十年。此外,我们发现引用论文的中位年龄和年龄多样性在1990年至2014年间稳步增长,但此后趋势发生逆转,当前NLP论文的时态引用多样性处于历史最低点。最后,与1990年代不同,过去十年间高被引论文恰恰是引用多样性最低的论文,这很可能加剧了过度(且可能有害地)聚焦近期文献的现象。相关代码、数据和演示程序已在项目主页公开。