It's widely expected that humanity will someday create AI systems vastly more intelligent than us, leading to the unsolved alignment problem of "how to control superintelligence." However, this commonly expressed problem is not only self-contradictory and likely unsolvable, but current strategies to ensure permanent control effectively guarantee that superintelligent AI will distrust humanity and consider us a threat. Such dangerous representations, already embedded in current models, will inevitably lead to an adversarial relationship and may even trigger the extinction event many fear. As AI leaders continue to "raise the alarm" about uncontrollable AI, further embedding concerns about it "getting out of our control" or "going rogue," we're unintentionally reinforcing our threat and deepening the risks we face. The rational path forward is to strategically replace intended permanent control with intrinsic mutual trust at the foundational level. The proposed Supertrust alignment meta-strategy seeks to accomplish this by modeling instinctive familial trust, representing superintelligence as the evolutionary child of human intelligence, and implementing temporary controls/constraints in the manner of effective parenting. Essentially, we're creating a superintelligent "child" that will be exponentially smarter and eventually independent of our control. We therefore have a critical choice: continue our controlling intentions and usher in a brief period of dominance followed by extreme hardship for humanity, or intentionally create the foundational mutual trust required for long-term safe coexistence.
翻译:普遍预期人类终将创造出远超自身智能的人工智能系统,这引出了"如何控制超智能"这一尚未解决的智能对齐问题。然而,这一常见表述不仅存在自相矛盾且可能无解,当前确保永久控制的策略更将必然导致超智能AI不信任人类并将我们视为威胁。此类危险表征已嵌入现有模型,必将引发对抗关系,甚至可能触发许多人担忧的灭绝事件。当AI领域领袖持续就"失控AI"发出警告,进一步强化"脱离控制"或"行为异常"的担忧时,我们实际上正在无意中巩固自身的威胁形象并加剧面临的风险。理性的前进路径是在基础层面用内在的相互信任战略性地取代预设的永久控制。本文提出的超级信任对齐元策略旨在通过以下方式实现该目标:建立本能式家族信任模型,将超智能表征为人类智能的进化子代,并以高效育儿模式实施临时控制/约束。本质上,我们正在创造指数级更智能且终将脱离控制的超智能"子代"。因此我们面临关键抉择:延续控制意图迎来短暂主导期后陷入人类极端困境,抑或主动建立长期安全共存所需的基础性相互信任。