Assessing a cited paper's impact is typically done by analyzing its citation context in isolation within the citing paper. While this focuses on the most directly relevant text, it prevents relative comparisons across all the works a paper cites. We propose Crystal, which instead jointly ranks all cited papers within a citing paper using large language models (LLMs). To mitigate LLMs' positional bias, we rank each list three times in a randomized order and aggregate the impact labels through majority voting. This joint approach leverages the full citation context, rather than evaluating citations independently, to more reliably distinguish impactful references. Crystal outperforms a prior state-of-the-art impact classifier by +9.5% accuracy and +8.3% F1 on a dataset of human-annotated citations. Crystal further gains efficiency through fewer LLM calls and outperforms prior baselines using an open-weight model, enabling scalable, cost-effective citation impact analysis. In a case study of ACL Test-of-Time award-winning papers, we find that Crystal's impact characterizations align closely with long-term scientific recognition. We release Crystal-Bank, a 46.8k-paper dataset with rankings and impact labels, along with code.
翻译:评估一篇被引论文的影响力通常通过孤立分析其在施引论文中的引文语境完成。虽然这聚焦于最直接相关的文本,但阻碍了对论文所引全部著作进行相对比较。我们提出Crystal方法,该模型通过大型语言模型对施引论文内的所有被引论文进行联合排序。为缓解LLM的位置偏差,我们以随机顺序对每个列表进行三次排序,并通过多数投票聚合影响力标签。这种联合方法利用了完整的引文语境(而非独立评估引文),从而更可靠地区分高影响力参考文献。在人工标注引文数据集上,Crystal相较于现有最先进影响力分类器准确率提升9.5%,F1值提升8.3%。Crystal通过更少的LLM调用次数提升了效率,并基于开放权重模型超越先前基线方法,实现了可扩展、低成本的引文影响力分析。在ACL时间检验奖获奖论文的案例研究中,我们发现Crystal的影响力刻画与长期科学认可度高度吻合。我们发布了包含46,800篇论文的排序与影响力标签的Crystal-Bank数据集及相应代码。