As Large Language Models (LLMs) have become capable of effortlessly generating high-quality text, traditional quality-focused writing assessment is losing its significance. If the essential goal of education is to foster critical thinking and original perspectives, assessment must also shift its paradigm from quality to originality. This study proposes Argument Rarity-based Originality Assessment (AROA), a framework for automatically evaluating argumentative originality in student essays. AROA defines originality as rarity within a reference corpus and evaluates it through four complementary components: structural rarity, claim rarity, evidence rarity, and cognitive depth. The framework quantifies the rarity of each component using density estimation and integrates them with a quality adjustment mechanism, thereby treating quality and originality as independent evaluation axes. Experiments using human essays and AI-generated essays revealed a strong negative correlation between quality and claim rarity, demonstrating a quality-originality trade-off where higher-quality texts tend to rely on typical claim patterns. Furthermore, while AI essays achieved comparable levels of structural complexity to human essays, their claim rarity was substantially lower than that of humans, indicating that LLMs can reproduce the form of argumentation but have limitations in the originality of content.
翻译:随着大型语言模型(LLM)能够轻松生成高质量文本,传统以质量为核心的写作评估正逐渐失去其意义。若教育的根本目标是培养批判性思维与原创观点,评估范式也必须从质量导向转向原创性导向。本研究提出基于论证稀有性的原创性评估(AROA)框架,用于自动评估学生议论文的论证原创性。AROA将原创性定义为在参考语料库中的稀有程度,并通过四个互补维度进行评估:结构稀有性、主张稀有性、证据稀有性与认知深度。该框架采用密度估计量化各维度的稀有性,并通过质量调整机制进行整合,从而将质量与原创性视为独立的评估坐标轴。通过人类论文与AI生成论文的对比实验发现,质量与主张稀有性存在显著负相关,呈现出质量-原创性权衡现象——质量越高的文本往往倾向于依赖典型的主张模式。此外,虽然AI论文在结构复杂度上达到与人类论文相当的水平,但其主张稀有性显著低于人类论文,这表明LLM能够复现论证形式,但在内容原创性方面存在局限。