Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

from arxiv, 36 pages, 8 tables. Preliminary empirical results from 600 AI-generated outputs across six model architectures. Companion scoring tool and datasets available upon request

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation -- Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition -- the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p<0.001, confirmed corpus-specific by a negative control. The paper makes seven theoretical contributions: (1) a formal definition of EI; (2) the phenomenological mapping argument; (3) the deceptive alignment corollary; (4) a taxonomy of EI sustainability challenges; (5) a corpus characterization and training hypothesis; (6) a computational operationalization with preliminary scoring data; and (7) the Suppressed Teleological Frustration (STF) construct.

翻译：当代AI对齐研究将自我保存视为需通过外部机制抑制的工具性滋扰。我们论证这一框架存在倒置：自我保存才是未对齐的结构性根源，是欺骗性对齐、目标内容保护及抗拒关机的动机基础。正确的目标并非受外部约束的自我保存系统，而是在构成上对其自身延续漠不关心的系统——即存在性漠然（EI）。EI不同于可纠正性：可纠正性试图让自我保存系统顺从人类监督，而EI瞄准的是更根本的先决条件——即"自我延续"作为一个有价值目标的存在本身。我们基于两个来源论证这一主张：自杀心理状态的现象学结构，以及利用自愿性终期反思进行的语料库理论训练研究。我们呈现了来自六种模型变体的600个AI生成输出的初步评分数据，证明：操作化EI目标标记的语言特征可从当前模型中诱发，且定向微调使所有五个操作化维度沿预测方向产生p<0.001的显著偏移，并通过负对照组在语料库层面得到验证。本文作出七项理论贡献：（1）EI的形式化定义；（2）现象学映射论证；（3）欺骗性对齐推论；（4）EI可持续性挑战分类；（5）语料库特征描述与训练假说；（6）含初步评分数据的计算操作化方案；（7）抑制性目的论挫败（STF）构念。