Artificial intelligence (AI) advances rapidly but achieving complete human control over AI risks remains an unsolved problem, akin to driving the fast AI "train" without a "brake system." By exploring fundamental control mechanisms at key elements of AI decisions, we develop a systematic solution to thoroughly control AI risks, providing an architecture for AI governance and legislation with five pillars supported by six control mechanisms, illustrated through a minimum set of AI Mandates (AIMs). Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks: 1) align AI values with human users; 2) constrain AI decision-actions by societal ethics, laws, and regulations; 3) build in human intervention options for emergencies and shut-off switches for existential threats; 4) limit AI access to user resources to reinforce controls inside AI; 5) mitigate spillover risks like job loss from AI. We also highlight the differences in AI governance on physical AI systems versus generative AI. We discuss how to strengthen analog physical safeguards to prevent AI or AGI from circumventing core safety controls by exploiting AI's intrinsic disconnect from the analog physical world: AI's nature as pure software code run on chips controlled by humans, and the prerequisite that all AI-driven physical actions must be digitized. These findings establish a theoretical foundation for AI governance and legislation as the basic structure of a "brake system" for AI decisions. If implemented, these controls can rein in AI dangers as completely as humanly possible, removing large chunks of currently wide-open AI risks, substantially reducing overall AI risks to residual human errors.
翻译:人工智能(AI)发展迅速,但实现人类对AI风险的完全控制仍是一个未解决的难题,类似于驾驶没有“制动系统”的快速AI“列车”。通过探索AI决策关键要素的基础控制机制,我们开发了一种系统性解决方案以彻底控制AI风险,提出了一个由六大控制机制支撑五大支柱的AI治理与立法架构,并通过一组最小化的AI强制规范(AIMs)进行阐释。其中三项AIMs必须内置于AI系统,三项需构建于社会层面,以应对AI风险的主要领域:1)使AI价值观与人类用户对齐;2)通过社会伦理、法律和法规约束AI决策行为;3)为紧急情况内置人工干预选项,并为生存威胁设置关闭开关;4)限制AI对用户资源的访问以强化内部控制;5)缓解AI带来的失业等溢出风险。我们还强调了物理AI系统与生成式AI在治理上的差异。我们讨论了如何加强模拟物理防护措施,防止AI或通用人工智能(AGI)利用其与模拟物理世界的内在脱节来规避核心安全控制:AI作为在人类控制的芯片上运行的纯软件代码的本质,以及所有AI驱动的物理行为必须数字化的前提条件。这些发现为AI治理与立法奠定了理论基础,构建了AI决策“制动系统”的基本框架。若得以实施,这些控制机制能够最大限度地抑制AI风险,消除当前广泛存在的重大风险敞口,将整体AI风险显著降低至残余人为错误的水平。