Identifying which newly published scientific papers are likely to become highly cited is important for prioritizing research attention, supporting editorial decisions, and guiding the allocation of scientific resources, particularly under cold-start conditions where little direct evidence is available at publication time. In this work, we formulate impact prediction as a cohort-normalized top-P% classification task and compare graph-based and LLM-based approaches under a unified framework. We construct citation and textual-similarity graphs under temporal constraints and generate Node2Vec representations, either alone or combined with OpenAI text embeddings. The best supervised configuration combines directed citation graphs with textual embeddings, reaching approximately 0.84-0.85 AUC. We also evaluate a GPT-based GraphRAG setup, using GPT 5.5 and 5.4 Nano, in which graph neighborhoods are used as contextual evidence for prediction. Although the LLM-based approach achieves high performance, retrieved context does not consistently improve results; target-only prompts often perform as well as or better than GraphRAG prompts achieving the 0.87 mark. These findings indicate that structural and textual signals are complementary for supervised prediction, while retrieval augmentation must be carefully evaluated against simpler LLM baselines.
翻译:识别哪些新发表的科学论文可能获得高引用量,对于优先分配研究关注、支持编辑决策以及指导科学资源配置至关重要,尤其是在冷启动条件下——论文发表时可直接获取的证据非常有限。本研究将影响力预测定义为经过队列归一化的前P%分类任务,并在统一框架下比较了基于图的方法与基于大语言模型的方法。我们构建了符合时间约束的引文图和文本相似度图,并生成Node2Vec表征(可单独使用或与OpenAI文本嵌入结合使用)。最佳监督配置将定向引文图与文本嵌入相结合,AUC达到约0.84-0.85。我们还评估了基于GPT的GraphRAG设置(采用GPT 5.5和5.4 Nano),其中图邻域被用作预测的上下文证据。尽管基于大语言模型的方法实现了高性能,但检索到的上下文并未持续改进结果;仅使用目标论文提示(无检索增强)的表现往往与GraphRAG提示相当甚至更优,达到0.87的水平。这些发现表明,结构信号与文本信号在监督预测中具有互补性,而检索增强必须对照更简单的大语言模型基线进行谨慎评估。