The accelerating pace of scientific publication makes it difficult to identify truly original research among incremental work. We propose a framework for estimating the conceptual novelty of research papers by combining semantic representation learning with retrieval-based comparison against prior literature. We model novelty as both a binary classification task (novel vs. non-novel) and a pairwise ranking task (comparative novelty), enabling absolute and relative assessments. Experiments benchmark three model scales, ranging from compact domain-specific encoders to a zero-shot frontier model. Results show that fine-tuned lightweight models outperform larger zero-shot models despite their smaller parameter count, indicating that task-specific supervision matters more than scale for conceptual novelty estimation. We further deploy the best-performing model as an online system for public interaction and real-time novelty scoring.
翻译:科学出版物加速增长的步伐使得在渐进式工作中识别真正原创的研究变得困难。我们提出了一个框架,通过将语义表示学习与基于检索的先前文献比较相结合,来评估研究论文的概念新颖性。我们将新颖性建模为二元分类任务(新颖与非新颖)和成对排序任务(比较新颖性),从而实现绝对和相对评估。实验对三种模型规模进行了基准测试,范围从紧凑的领域特定编码器到零样本前沿模型。结果表明,尽管参数数量较少,经过微调的轻量级模型性能优于较大的零样本模型,这表明对于概念新颖性评估,任务特定的监督比模型规模更为重要。我们进一步将性能最佳的模型部署为一个在线系统,用于公众交互和实时新颖性评分。