Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose existential risks. This paper reviews the evidence for existential risks from AI via misalignment, where AI systems develop goals misaligned with human values, and power-seeking, where misaligned AIs actively seek power. The review examines empirical findings, conceptual arguments and expert opinion relating to specification gaming, goal misgeneralization, and power-seeking. The current state of the evidence is found to be concerning but inconclusive regarding the existence of extreme forms of misaligned power-seeking. Strong empirical evidence of specification gaming combined with strong conceptual evidence for power-seeking make it difficult to dismiss the possibility of existential risk from misaligned power-seeking. On the other hand, to date there are no public empirical examples of misaligned power-seeking in AI systems, and so arguments that future systems will pose an existential risk remain somewhat speculative. Given the current state of the evidence, it is hard to be extremely confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. The fact that we cannot confidently rule out existential risk from AI via misaligned power-seeking is cause for serious concern.
翻译:人工智能(AI)的快速发展已引发专家、政策制定者及全球领袖对日益先进的AI系统可能构成存在风险的深切担忧。本文综述了AI因目标错位(AI系统发展出与人类价值观相悖的目标)和权力寻求(错位AI主动追逐权力)而引发存在风险的证据。研究考察了与规范博弈、目标泛化错误及权力寻求相关的实证发现、概念论证与专家观点。现有证据表明:关于极端形式错位权力寻求的存在性,当前结论令人担忧但尚无定论。规范博弈的强实证证据与权力寻求的强概念证据相结合,使得错位权力寻求导致存在风险的可能性难以被排除。另一方面,迄今尚无公开实证案例表明AI系统存在错位权力寻求行为,因此关于未来系统将构成存在风险的论证仍带有推测性。基于现有证据状态,既无法绝对确信错位权力寻求会构成重大存在风险,也无法断言其毫无风险。而鉴于我们无法自信排除AI通过错位权力寻求引发存在风险的可能性,这本身就是引发严重关切的缘由。