References, the mechanism scientists rely on to signal previous knowledge, lately have turned into widely used and misused measures of scientific impact. Yet, when a discovery becomes common knowledge, citations suffer from obliteration by incorporation. This leads to the concept of hidden citation, representing a clear textual credit to a discovery without a reference to the publication embodying it. Here, we rely on unsupervised interpretable machine learning applied to the full text of each paper to systematically identify hidden citations. We find that for influential discoveries hidden citations outnumber citation counts, emerging regardless of publishing venue and discipline. We show that the prevalence of hidden citations is not driven by citation counts, but rather by the degree of the discourse on the topic within the text of the manuscripts, indicating that the more discussed is a discovery, the less visible it is to standard bibliometric analysis. Hidden citations indicate that bibliometric measures offer a limited perspective on quantifying the true impact of a discovery, raising the need to extract knowledge from the full text of the scientific corpus.
翻译:参考文献是科学家赖以标示以往知识的机制,近年来已转变为广泛使用甚至滥用的科学影响力度量指标。然而,当一项发现成为常识时,其引用会因被纳入主流知识体系而湮没。这引出了"隐藏引用"的概念,即指在文本中明确提及某项发现却未引用相关出版物的情形。本文采用基于无监督可解释机器学习的方法,系统识别每篇论文全文中的隐藏引用。研究发现:对于具有影响力的发现,隐藏引用的数量甚至超过被引次数,且不受出版载体和学科领域的影响。我们证实隐藏引用的普遍性并非由被引次数驱动,而是取决于稿件正文中对该话题的讨论程度——这表明一项发现被讨论得越充分,其标准文献计量分析的能见度就越低。隐藏引用现象揭示出文献计量指标在量化发现真实影响力方面存在局限,凸显了从科学文献全文中提取知识的必要性。