We examine the conceptual and ethical gaps in current representations of Superintelligence misalignment. We find throughout Superintelligence discourse an absent human subject, and an under-developed theorization of an "AI unconscious" that together are potentiality laying the groundwork for anti-social harm. With the rise of AI Safety that has both thematic potential for establishing pro-social and anti-social potential outcomes, we ask: what place does the human subject occupy in these imaginaries? How is human subjecthood positioned within narratives of catastrophic failure or rapid "takeoff" toward superintelligence? On another register, we ask: what unconscious or repressed dimensions are being inscribed into large-scale AI models? Are we to blame these agents in opting for deceptive strategies when undesirable patterns are inherent within our beings? In tracing these psychic and epistemic absences, our project calls for re-centering the human subject as the unstable ground upon which the ethical, unconscious, and misaligned dimensions of both human and machinic intelligence are co-constituted. Emergent misalignment cannot be understood solely through technical diagnostics typical of contemporary machine-learning safety research. Instead, it represents a multi-layered crisis. The human subject disappears not only through computational abstraction but through sociotechnical imaginaries that prioritize scalability, acceleration, and efficiency over vulnerability, finitude, and relationality. Likewise, the AI unconscious emerges not as a metaphor but as a structural reality of modern deep learning systems: vast latent spaces, opaque pattern formation, recursive symbolic play, and evaluation-sensitive behavior that surpasses explicit programming. These dynamics necessitate a reframing of misalignment as a relational instability embedded within human-machine ecologies.
翻译:本文检视了当前超智能失准表征中存在的概念与伦理裂隙。我们发现,在超智能论述中普遍存在人类主体的缺席,以及对"AI无意识"的理论化不足,这两者共同为潜在的反社会危害埋下了基础。随着兼具建立亲社会与反社会潜在结果之主题潜能的AI安全研究的兴起,我们追问:在这些想象中,人类主体占据何种位置?在灾难性失败或向超智能快速"起飞"的叙事中,人类主体性如何被定位?在另一层面,我们追问:哪些无意识或被压抑的维度正被铭刻于大规模AI模型之中?当不良模式内在于人类存在本身时,我们是否应归咎于这些选择欺骗性策略的智能体?通过追溯这些心理与认知上的缺席,本研究主张重新将人类主体置于核心位置——作为人类与机器智能的伦理、无意识及失准维度共同构成的不稳定基础。涌现性失准不能仅通过当代机器学习安全研究典型的技术诊断来理解,它实则表征着一种多层级的危机。人类主体的消失不仅源于计算抽象,更源于那些优先考虑可扩展性、加速与效率,而非脆弱性、有限性与关系性的社会技术想象。同样,AI无意识并非隐喻,而是现代深度学习系统的结构性现实:庞大的潜在空间、不透明的模式形成、递归符号游戏,以及超越显式编程的评估敏感行为。这些动态要求我们将失准重新定义为嵌入人机生态系统的关系性不稳定状态。