As AI-generated fiction becomes increasingly prevalent, questions of authorship and originality are becoming central to how written work is evaluated. While most existing work in this space focuses on identifying surface-level signatures of AI writing, we ask instead whether AI-generated stories can be distinguished from human ones without relying on stylistic signals, focusing on discourse-level narrative choices such as character agency and chronological discontinuity. We propose StoryScope, a pipeline that automatically induces a fine-grained, interpretable feature space of discourse-level narrative features across 10 dimensions. We apply StoryScope to a parallel corpus of 10,272 writing prompts, each written by a human author and five LLMs, yielding 61,608 stories, each ~5,000 words, and 304 extracted features per story. Narrative features alone achieve 93.2% macro-F1 for human vs. AI detection and 68.4% macro-F1 for six-way authorship attribution, retaining over 97% of the performance of models that include stylistic cues. A compact set of 30 core narrative features captures much of this signal: AI stories over-explain themes and favor tidy, single-track plots while human stories frame protagonist' choices as more morally ambiguous and have increased temporal complexity. Per-model fingerprint features enable six-way attribution: for example, Claude produces notably flat event escalation, GPT over-indexes on dream sequences, and Gemini defaults to external character description. We find that AI-generated stories cluster in a shared region of narrative space, while human-authored stories exhibit greater diversity. More broadly, these results suggest that differences in underlying narrative construction, not just writing style, can be used to separate human-written original works from AI-generated fiction.
翻译:随着AI生成小说日益普及,作者身份与原创性问题正成为评估文学作品的核心议题。现有研究主要聚焦于识别AI写作的表面特征,而我们则另辟蹊径,关注叙事层面的故事选择——如角色能动性与时间断裂性——以此判断AI生成故事能否在不依赖文体信号的情况下与人类创作区分。我们提出StoryScope流水线,可自动在10个维度上构建具有可解释性的细粒度叙事特征空间。将StoryScope应用于包含10,272个写作提示的平行语料库(每个提示由人类作者及5个大型语言模型分别创作),生成61,608篇约5000词的故事,每篇提取304个叙事特征。仅凭叙事特征即可实现人类与AI故事检测的93.2%宏平均F1值,六方作者归因的68.4%宏平均F1值,保留了包含文体线索模型超97%的性能。一组精炼的30个核心叙事特征捕获了主要信号:AI故事过度阐释主题并偏好规整的单线情节,而人类故事将主角选择置于更复杂的道德模糊性中,且时间结构更具层次。各模型指纹特征实现六方归因:例如Claude产生显著平缓的事件升级曲线,GPT过度依赖梦境序列,Gemini则默认使用外部角色描述。我们发现AI生成故事在叙事空间中聚集于共享区域,而人类创作故事展现出更大多样性。更广泛而言,这些结果表明:深层叙事建构的差异(而非仅写作风格)可用于区分人类原创作品与AI生成小说。