Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors.
翻译:基础模型最终可能通过多种途径危及国家安全:事故、无意升级、意外冲突、武器扩散以及干扰人类外交,仅是众多风险中的几例。由OpenAI地缘政治团队与加州大学伯克利风险与安全实验室联合举办的"人工智能信心建设措施"研讨会,汇聚了多方利益相关者,共同探讨降低基础模型对国际安全潜在风险的工具与策略。信心建设措施(CBMs)起源于冷战时期,指旨在减少敌意、防止冲突升级、增进各方信任的行动。其灵活性使其成为应对基础模型领域快速变化的关键工具。参会者确定了以下直接适用于基础模型的CBM措施,本论文集将对此进行详细阐释:1. 危机热线 2. 事件共享 3. 模型、透明度和系统卡片 4. 内容溯源与数字水印 5. 协同红队测试与桌面推演 6. 数据集与评估共享。由于大多数基础模型开发者属于非政府实体,许多CBM措施需要更广泛的利益相关者群体参与。这些措施可由人工智能实验室或相关政府行为体实施。