Novelty assessment is a central yet understudied aspect of peer review, particularly in high volume fields like NLP where reviewer capacity is increasingly strained. We present a structured approach for automated novelty evaluation that models expert reviewer behavior through three stages: content extraction from submissions, retrieval and synthesis of related work, and structured comparison for evidence based assessment. Our method is informed by a large scale analysis of human written novelty reviews and captures key patterns such as independent claim verification and contextual reasoning. Evaluated on 182 ICLR 2025 submissions with human annotated reviewer novelty assessments, the approach achieves 86.5% alignment with human reasoning and 75.3% agreement on novelty conclusions - substantially outperforming existing LLM based baselines. The method produces detailed, literature aware analyses and improves consistency over ad hoc reviewer judgments. These results highlight the potential for structured LLM assisted approaches to support more rigorous and transparent peer review without displacing human expertise. Data and code are made available.
翻译:新颖性评估是同行评审的核心但研究不足的方面,尤其在自然语言处理等高产出领域,评审人员的能力日益紧张。我们提出了一种结构化自动新颖性评估方法,通过三个阶段模拟专家评审行为:从投稿中提取内容、检索与综合相关文献,以及基于证据评估的结构化比较。该方法基于对人工撰写的新颖性评审的大规模分析,捕捉了独立主张验证和上下文推理等关键模式。在182篇带有新颖性人工标注的ICLR 2025投稿上评估,该方法与人类推理的一致性达到86.5%,新颖性结论的同意率为75.3%,显著优于现有基于LLM的基线。该方法生成详细、文献感知的分析,并提高了临时评审判断的一致性。这些结果凸显了结构化LLM辅助方法在支持更严谨、透明同行评审方面的潜力,同时无需取代人类专业知识。数据与代码已公开。