This study examines the tendency to cite older work across 20 fields of study over 43 years (1980--2023). We put NLP's propensity to cite older work in the context of these 20 other fields to analyze whether NLP shows similar temporal citation patterns to these other fields over time or whether differences can be observed. Our analysis, based on a dataset of approximately 240 million papers, reveals a broader scientific trend: many fields have markedly declined in citing older works (e.g., psychology, computer science). We term this decline a 'citation age recession', analogous to how economists define periods of reduced economic activity. The trend is strongest in NLP and ML research (-12.8% and -5.5% in citation age from previous peaks). Our results suggest that citing more recent works is not directly driven by the growth in publication rates (-3.4% across fields; -5.2% in humanities; -5.5% in formal sciences) -- even when controlling for an increase in the volume of papers. Our findings raise questions about the scientific community's engagement with past literature, particularly for NLP, and the potential consequences of neglecting older but relevant research. The data and a demo showcasing our results are publicly available.
翻译:本研究考察了43年间(1980-2023年)20个学科领域引用早期研究成果的趋势。我们将自然语言处理(NLP)引用早期文献的倾向置于其他20个领域的背景下进行分析,以探究NLP是否随时间推移表现出与其他领域相似的时序引用模式,抑或存在可观测的差异。基于约2.4亿篇论文数据集的分析揭示了一个更广泛的科学趋势:许多领域对早期著作的引用显著下降(如心理学、计算机科学)。我们将这种下降称为"引文年龄衰退",类比经济学家定义经济活动减少时期的概念。该趋势在NLP与机器学习(ML)研究中最为显著(引文年龄较前期峰值分别下降12.8%和5.5%)。研究结果表明,引用近期成果的现象并非直接由论文发表率的增长所驱动(各领域整体下降3.4%,人文学科下降5.2%,形式科学下降5.5%)——即使在控制论文数量增长的情况下亦然。我们的发现引发了关于科学界(特别是NLP领域)与历史文献互动程度的思考,以及忽视早期相关研究可能带来的后果。相关数据及成果演示已公开提供。