Artificial intelligence systems based on large language models (LLMs) can now generate coherent text, music, and images, yet they operate without a persistent state: each inference reconstructs context from scratch. This paper introduces the Narrative Continuity Test (NCT) -- a conceptual framework for evaluating identity persistence and diachronic coherence in AI systems. Unlike capability benchmarks that assess task performance, the NCT examines whether an LLM remains the same interlocutor across time and interaction gaps. The framework defines five necessary axes -- Situated Memory, Goal Persistence, Autonomous Self-Correction, Stylistic & Semantic Stability, and Persona/Role Continuity -- and explains why current architectures systematically fail to support them. Case analyses (Character.AI, Grok, Replit, Air Canada) show predictable continuity failures under stateless inference. The NCT reframes AI evaluation from performance to persistence, outlining conceptual requirements for future benchmarks and architectural designs that could sustain long-term identity and goal coherence in generative models.
翻译:基于大语言模型(LLMs)的人工智能系统现已能够生成连贯的文本、音乐和图像,但其运行缺乏持久状态:每次推理均需从头重建上下文。本文提出叙事连续性测试(NCT)——一个用于评估人工智能系统身份持久性与历时一致性的概念框架。与评估任务性能的能力基准不同,NCT检验大语言模型在跨越时间与交互间隙时是否保持为同一对话主体。该框架定义了五个必要维度——情境记忆、目标持久性、自主纠错能力、风格与语义稳定性、以及角色/身份连续性——并阐释了当前架构为何系统性地无法支持这些维度。案例分析(Character.AI、Grok、Replit、Air Canada)展示了无状态推理下可预测的连续性失效现象。NCT将人工智能评估从性能导向重构为持久性导向,为未来基准测试和架构设计勾勒出概念性要求,以期在生成模型中维持长期身份与目标一致性。