A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking

Rapid advancements in artificial intelligence (AI) have sparked growing concerns among experts, policymakers, and world leaders regarding the potential for increasingly advanced AI systems to pose existential risks. This paper reviews the evidence for existential risks from AI via misalignment, where AI systems develop goals misaligned with human values, and power-seeking, where misaligned AIs actively seek power. The review examines empirical findings, conceptual arguments and expert opinion relating to specification gaming, goal misgeneralization, and power-seeking. The current state of the evidence is found to be concerning but inconclusive regarding the existence of extreme forms of misaligned power-seeking. Strong empirical evidence of specification gaming combined with strong conceptual evidence for power-seeking make it difficult to dismiss the possibility of existential risk from misaligned power-seeking. On the other hand, to date there are no public empirical examples of misaligned power-seeking in AI systems, and so arguments that future systems will pose an existential risk remain somewhat speculative. Given the current state of the evidence, it is hard to be extremely confident either that misaligned power-seeking poses a large existential risk, or that it poses no existential risk. The fact that we cannot confidently rule out existential risk from AI via misaligned power-seeking is cause for serious concern.

翻译：人工智能（AI）的快速发展已引发专家、政策制定者及全球领袖对日益先进的AI系统可能构成存在风险的深切担忧。本文综述了AI因目标错位（AI系统发展出与人类价值观相悖的目标）和权力寻求（错位AI主动追逐权力）而引发存在风险的证据。研究考察了与规范博弈、目标泛化错误及权力寻求相关的实证发现、概念论证与专家观点。现有证据表明：关于极端形式错位权力寻求的存在性，当前结论令人担忧但尚无定论。规范博弈的强实证证据与权力寻求的强概念证据相结合，使得错位权力寻求导致存在风险的可能性难以被排除。另一方面，迄今尚无公开实证案例表明AI系统存在错位权力寻求行为，因此关于未来系统将构成存在风险的论证仍带有推测性。基于现有证据状态，既无法绝对确信错位权力寻求会构成重大存在风险，也无法断言其毫无风险。而鉴于我们无法自信排除AI通过错位权力寻求引发存在风险的可能性，这本身就是引发严重关切的缘由。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日