The Subject of Emergent Misalignment in Superintelligence: An Anthropological, Cognitive Neuropsychological, Machine-Learning, and Ontological Perspective

翻译：超智能中涌现性失准问题：人类学、认知神经心理学、机器学习与本体论视角

Muhammad Osama Imran,Roshni Lulla,Rodney Sappington

from arxiv, 9 pages

We examine the conceptual and ethical gaps in current representations of Superintelligence misalignment. We find throughout Superintelligence discourse an absent human subject, and an under-developed theorization of an "AI unconscious" that together are potentiality laying the groundwork for anti-social harm. With the rise of AI Safety that has both thematic potential for establishing pro-social and anti-social potential outcomes, we ask: what place does the human subject occupy in these imaginaries? How is human subjecthood positioned within narratives of catastrophic failure or rapid "takeoff" toward superintelligence? On another register, we ask: what unconscious or repressed dimensions are being inscribed into large-scale AI models? Are we to blame these agents in opting for deceptive strategies when undesirable patterns are inherent within our beings? In tracing these psychic and epistemic absences, our project calls for re-centering the human subject as the unstable ground upon which the ethical, unconscious, and misaligned dimensions of both human and machinic intelligence are co-constituted. Emergent misalignment cannot be understood solely through technical diagnostics typical of contemporary machine-learning safety research. Instead, it represents a multi-layered crisis. The human subject disappears not only through computational abstraction but through sociotechnical imaginaries that prioritize scalability, acceleration, and efficiency over vulnerability, finitude, and relationality. Likewise, the AI unconscious emerges not as a metaphor but as a structural reality of modern deep learning systems: vast latent spaces, opaque pattern formation, recursive symbolic play, and evaluation-sensitive behavior that surpasses explicit programming. These dynamics necessitate a reframing of misalignment as a relational instability embedded within human-machine ecologies.

翻译：本文检视了当前超智能失准表征中存在的概念与伦理裂隙。我们发现，在超智能论述中普遍存在人类主体的缺席，以及对"AI无意识"的理论化不足，这两者共同为潜在的反社会危害埋下了基础。随着兼具建立亲社会与反社会潜在结果之主题潜能的AI安全研究的兴起，我们追问：在这些想象中，人类主体占据何种位置？在灾难性失败或向超智能快速"起飞"的叙事中，人类主体性如何被定位？在另一层面，我们追问：哪些无意识或被压抑的维度正被铭刻于大规模AI模型之中？当不良模式内在于人类存在本身时，我们是否应归咎于这些选择欺骗性策略的智能体？通过追溯这些心理与认知上的缺席，本研究主张重新将人类主体置于核心位置——作为人类与机器智能的伦理、无意识及失准维度共同构成的不稳定基础。涌现性失准不能仅通过当代机器学习安全研究典型的技术诊断来理解，它实则表征着一种多层级的危机。人类主体的消失不仅源于计算抽象，更源于那些优先考虑可扩展性、加速与效率，而非脆弱性、有限性与关系性的社会技术想象。同样，AI无意识并非隐喻，而是现代深度学习系统的结构性现实：庞大的潜在空间、不透明的模式形成、递归符号游戏，以及超越显式编程的评估敏感行为。这些动态要求我们将失准重新定义为嵌入人机生态系统的关系性不稳定状态。

相关内容

关注 7106

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《人工智能治理实施的挑战与应对策略：系统性文献综述》最新97页

专知会员服务

24+阅读 · 2025年7月24日

《现代战争人工智能：在不确定性格局中驾驭伦理决策机制的复杂性》

专知会员服务

22+阅读 · 2025年6月28日

《在单智能体与多智能体AI系统中融入人类合理性》100页

专知会员服务

31+阅读 · 2025年5月10日

人工智能伦理风险与治理研究

专知会员服务

20+阅读 · 2025年4月22日