This study proposes Argument Rarity-based Originality Assessment (AROA), a framework for automatically evaluating argumentative originality in student essays. AROA defines originality as rarity within a reference corpus and evaluates it through four complementary components: structural rarity, claim rarity, evidence rarity, and cognitive depth, quantified via density estimation and integrated with quality adjustment. Experiments using 1,375 human essays and 1,000 AI-generated essays on two argumentative topics revealed three key findings. First, a strong negative correlation ($r = -0.67$) between text quality and claim rarity demonstrates a quality-originality trade-off. Second, while AI essays achieved near-perfect quality scores ($Q = 0.998$), their claim rarity was approximately one-fifth of human levels (AI: 0.037, human: 0.170), indicating that LLMs can reproduce argumentative structure but not semantic originality. Third, the four components showed low mutual correlations ($r = 0.06$--$0.13$ between structural and semantic dimensions), confirming that they capture genuinely independent aspects of originality. These results suggest that writing assessment in the AI era must shift from quality to originality.
翻译:本研究提出了基于论证稀有性的原创性评估框架,用于自动评估学生议论文的论证原创性。该框架将原创性定义为参考语料库中的稀有程度,并通过四个互补维度进行评估:结构稀有性、主张稀有性、证据稀有性和认知深度。这些维度通过密度估计进行量化,并与质量调整因子相结合。基于两个议论文主题的1,375篇人工撰写论文和1,000篇AI生成论文的实验揭示了三个关键发现:首先,文本质量与主张稀有性之间存在显著负相关($r = -0.67$),表明质量与原创性存在权衡关系;其次,虽然AI论文获得了接近完美的质量分数($Q = 0.998$),但其主张稀有性仅为人类水平的五分之一(AI:0.037,人类:0.170),说明大语言模型能复现论证结构但无法实现语义原创性;第三,四个维度间相关性较低(结构与语义维度间$r = 0.06$--$0.13$),证实了它们捕捉的是原创性中真正独立的层面。这些结果表明,AI时代的写作评估必须从质量导向转向原创性导向。