Evaluating the significance of a paper is pivotal yet challenging for the scientific community. While the citation count is the most commonly used proxy for this purpose, they are widely criticized for failing to accurately reflect a paper's true impact. In this work, we propose a causal inference method, TextMatch, which adapts the traditional matching framework to high-dimensional text embeddings. Specifically, we encode each paper using the text embeddings by large language models (LLMs), extract similar samples by cosine similarity, and synthesize a counterfactual sample by the weighted average of similar papers according to their similarity values. We apply the resulting metric, called CausalCite, as a causal formulation of paper citations. We show its effectiveness on various criteria, such as high correlation with paper impact as reported by scientific experts on a previous dataset of 1K papers, (test-of-time) awards for past papers, and its stability across various sub-fields of AI. We also provide a set of findings that can serve as suggested ways for future researchers to use our metric for a better understanding of a paper's quality. Our code and data are at https://github.com/causalNLP/causal-cite.
翻译:评估论文的重要性对科学界而言至关重要,但同时也极具挑战性。虽然引用次数是最常用的替代指标,但它因无法准确反映论文的真实影响力而受到广泛批评。本研究提出一种因果推断方法——TextMatch,它将传统匹配框架适配至高维文本嵌入。具体而言,我们利用大语言模型(LLMs)的文本嵌入对每篇论文进行编码,通过余弦相似度提取相似样本,并根据相似度值对相似论文进行加权平均,合成反事实样本。我们将由此产生的指标称为CausalCite,并将其作为论文引用的因果性表述。我们通过多项标准验证其有效性,包括:与科学专家在先前包含1000篇论文的数据集中所报告论文影响力的高度相关性、历史论文的(时间检验)获奖情况,以及在人工智能各子领域的稳定性。我们还提供一系列发现,可作为未来研究者利用该指标更深入理解论文质量的参考路径。我们的代码和数据已公开于 https://github.com/causalNLP/causal-cite。