ClaimFlow: Tracing the Evolution of Scientific Claims in NLP

Scientific papers advance $\textit{claims}$ that later work supports, extends, or sometimes refutes. Yet existing methods for citation and claim analysis capture only fragments of this dialogue. In this work, we make these interactions explicit at the level of individual scientific claims. We introduce $\texttt{ClaimFlow}$, a claim-centric view of the NLP literature, built from $1{,}617$ ACL Anthology papers $(1979 - 2025)$ that are manually annotated with $5{,}689$ claims and $4{,}871$ cross-paper claim relations, indicating whether a citing paper $\texttt{supports}$, $\texttt{extends}$, $\texttt{qualifies}$, $\texttt{refutes}$, or references a cited claim as $\texttt{background}$. Building on $\texttt{ClaimFlow}$, we define a new task -- $\textit{Claim Relation Classification}$ -- which requires models to infer the scientific stance toward a cited claim from the text and citation context. Evaluating neural models and large language models on this task, we report baseline performance of $0.81$ macro-F1, suggesting that the task is tractable while leaving room for improvement. We then scale this framework to $\sim$$13k$ NLP papers to study claim evolution across decades of NLP research. We show that $63.5\%$ claims are never reused; only $11.1\%$ are ever challenged. Widely propagated claims are more often $\textit{reshaped}$ through qualification and extension than supported or refuted. Overall, $\texttt{ClaimFlow}$ offers a lens for examining how ideas shift and mature within NLP.

翻译：科学论文会提出$\textit{主张}$，后续工作对其进行支持、扩展甚至反驳。然而，现有引文与主张分析方法仅能捕捉到这类学术对话的片段。在本工作中，我们将这些交互关系显式呈现于单个科学主张层面。我们引入$\texttt{ClaimFlow}$——一个以主张为中心的NLP文献分析视角，基于$1{,}617$篇ACL文集论文（1979-2025年）构建，这些论文经过人工标注，包含$5{,}689$个主张及$4{,}871$条跨论文主张关系。这些关系标注了施引论文对引证主张是$\texttt{支持}$、$\texttt{扩展}$、$\texttt{限定}$、$\texttt{反驳}$，还是将其作为$\texttt{背景}$引用。基于$\texttt{ClaimFlow}$，我们定义新任务——$\textit{主张关系分类}$——要求模型通过文本与引文上下文推断对引证主张的学术立场。我们评估了神经网络模型与大型语言模型在此任务上的表现，报告基线宏F1值为$0.81$，表明该任务虽具挑战性但仍存在提升空间。随后我们将该框架扩展至约$13,000$篇NLP论文，以研究跨数十年NLP研究中主张的演化规律。研究发现：$63.5\%$的主张从未被复用，仅$11.1\%$曾受到质疑。广泛传播的主张更常通过限定与扩展实现$\textit{重构}$，而非被单纯支持或反驳。总体而言，$\texttt{ClaimFlow}$为考察NLP领域内思想的迁移与成熟过程提供了新视角。