This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals a broader scientific trend: many fields have markedly declined in citing older works (e.g., psychology, computer science). We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity. The trend is strongest in NLP and ML research (-12.8% and -5.5% in citation age from previous peaks). Our results suggest that citing more recent works is not directly driven by the growth in publication rates (-3.4% across fields; -5.2% in humanities; -5.5% in formal sciences) -- even when controlling for an increase in the volume of papers. Our findings raise questions about the scientific community's engagement with past literature, particularly for NLP, and the potential consequences of neglecting older but relevant research. The data and a demo showcasing our results are publicly available.
翻译:本研究考察了43年间(1980-2023年)20个学科领域引用较早期工作的倾向。我们将自然语言处理(NLP)领域引用较早工作的倾向置于这20个其他学科的背景下进行分析,以检验NLP的引文时间模式是否与其他学科随时间推移呈现相似特征,抑或存在可观测的差异。基于约2.4亿篇论文的数据集,我们的分析揭示了一个更广泛的科学趋势:许多学科引用较早工作的比例显著下降(例如心理学、计算机科学)。我们将这种下降称为"引文年龄衰退",类比经济学家定义经济活动减弱时期的方式。这一趋势在NLP和机器学习(ML)研究中最为显著(引文年龄较历史峰值分别下降12.8%和5.5%)。我们的结果表明,引用较新工作的现象并非直接由论文发表率的增长驱动(各学科平均下降3.4%;人文学科下降5.2%;形式科学下降5.5%)——即使在控制论文数量增长的情况下也是如此。我们的发现引发了对科学界与过往文献互动方式的思考,特别是对于NLP领域而言,以及忽视较早但相关研究可能带来的潜在后果。相关数据及展示研究结果的演示工具已公开提供。