This study proposes Argument Rarity-based Originality Assessment (AROA), a framework for automatically evaluating argumentative originality in student essays. AROA defines originality as rarity within a reference corpus and evaluates it through four complementary components: structural rarity, claim rarity, evidence rarity, and cognitive depth, quantified via density estimation and integrated with quality adjustment. Experiments using 1,375 human essays and 1,000 AI-generated essays on two argumentative topics revealed three key findings. First, a strong negative correlation (r = -0.67) between text quality and claim rarity demonstrates a quality-originality trade-off. Second, while AI essays achieved near-perfect quality scores (Q = 0.998), their claim rarity was approximately one-fifth of human levels (AI: 0.037, human: 0.170), indicating that LLMs can reproduce argumentative structure but not semantic originality. Third, the four components showed low mutual correlations (r = 0.06--0.13 between structural and semantic dimensions), confirming that they capture genuinely independent aspects of originality. These results suggest that writing assessment in the AI era must shift from quality to originality.
翻译:本研究提出基于论证稀有度的原创性评估框架,用于自动评估学生议论文的论证原创性。该框架将原创性定义为参考语料库中的稀有程度,并通过四个互补维度进行评估:结构稀有度、主张稀有度、证据稀有度与认知深度,这些维度通过密度估计进行量化并与质量调整因子相结合。基于两个议论文主题的1,375篇人工撰写论文与1,000篇AI生成论文的实验揭示了三个关键发现:首先,文本质量与主张稀有度之间存在强负相关性,相关系数达-0.67,体现了质量与原创性之间的权衡关系;其次,虽然AI论文获得了接近完美的质量评分,但其主张稀有度仅为人类水平的五分之一,表明大语言模型能够复现论证结构却无法实现语义层面的原创性;第三,四个评估维度间呈现较低的相关性,证实了它们捕捉的是原创性中真正独立的维度。这些结果表明,AI时代的写作评估必须从质量导向转向原创性导向。