References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
翻译:参考文献作为科学家依赖的机制用以标示先前知识,近来已演变为科学影响力中被广泛使用乃至滥用的衡量标准。然而,当一项发现成为常识时,其引用会因“湮没性整合”而受损。这引出了“隐藏引用”这一概念,即对某一发现给予明确的文本致谢,却未引用体现该发现的出版物。本研究通过对每篇论文的全文应用无监督可解释机器学习,系统性地识别隐藏引用。我们发现,对于有影响力的发现,隐藏引用数量超过了引用计数,且无论出版场所或学科领域均存在此现象。研究表明,隐藏引用的普遍性并非由引用计数驱动,而是由论文文本中对该主题的讨论程度所决定——一项发现被讨论得越多,其在标准文献计量分析中的可见度就越低。隐藏引用表明,文献计量测量在量化发现的真实影响力时视角有限,这凸显了从科学语料库全文中提取知识的必要性。