Evaluating the significance of a paper is pivotal yet challenging for the scientific community. While the citation count is the most commonly used proxy for this purpose, they are widely criticized for failing to accurately reflect a paper's true impact. In this work, we propose a causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. Specifically, we encode each paper using the text embeddings by large language models (LLMs), extract similar samples by cosine similarity, and synthesize a counterfactual sample by the weighted average of similar papers according to their similarity values. We apply the resulting metric, called CausalCite, as a causal formulation of paper citations. We show its effectiveness on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various sub-fields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of a paper's quality. Our code and data are at https://github.com/causalNLP/causal-cite.
翻译:评估论文的重要性对科学界而言既关键又充满挑战。尽管引用次数是最常用的代理指标,但因其难以准确反映论文的真实影响力而广受批评。本文提出一种因果推断方法TextMatch,将传统匹配框架适配至高维文本嵌入。具体而言,我们利用大语言模型的文本嵌入对每篇论文进行编码,通过余弦相似度提取相似样本,并依据相似度值对相似论文进行加权平均以合成反事实样本。我们将所得指标命名为CausalCite,作为论文引用的因果形式化定义。我们展示了该指标在多项标准下的有效性,例如与科学专家对先前1K论文数据集的论文影响力评估高度相关、对过往论文的时间检验奖预测能力,以及在AI各子领域中的稳定性。我们还提供了一系列发现,可作为未来研究者使用该指标更深入理解论文质量的建议路径。相关代码与数据请见https://github.com/causalNLP/causal-cite。