Foundation models could eventually introduce several pathways for undermining state security: accidents, inadvertent escalation, unintentional conflict, the proliferation of weapons, and the interference with human diplomacy are just a few on a long list. The Confidence-Building Measures for Artificial Intelligence workshop hosted by the Geopolitics Team at OpenAI and the Berkeley Risk and Security Lab at the University of California brought together a multistakeholder group to think through the tools and strategies to mitigate the potential risks introduced by foundation models to international security. Originating in the Cold War, confidence-building measures (CBMs) are actions that reduce hostility, prevent conflict escalation, and improve trust between parties. The flexibility of CBMs make them a key instrument for navigating the rapid changes in the foundation model landscape. Participants identified the following CBMs that directly apply to foundation models and which are further explained in this conference proceedings: 1. crisis hotlines 2. incident sharing 3. model, transparency, and system cards 4. content provenance and watermarks 5. collaborative red teaming and table-top exercises and 6. dataset and evaluation sharing. Because most foundation model developers are non-government entities, many CBMs will need to involve a wider stakeholder community. These measures can be implemented either by AI labs or by relevant government actors.
翻译:基础模型最终可能通过多种途径威胁国家安全:事故、无意升级、意外冲突、武器扩散以及干扰人类外交等仅是其中少数几例。由OpenAI地缘政治团队与加州大学伯克利风险与安全实验室联合举办的"人工智能信任构建措施"研讨会,汇聚多方利益相关者,共同探讨减轻基础模型对国际安全潜在风险的工具与策略。信任构建措施(CBMs)源于冷战时期,是指减少敌对情绪、防止冲突升级、增进各方信任的行动。其灵活性使其成为应对基础模型领域快速变化的关键工具。参会者识别出以下直接适用于基础模型的信任构建措施,并在本会议论文集中进一步阐释:1. 危机热线 2. 事件共享 3. 模型、透明度及系统卡片 4. 内容溯源与数字水印 5. 协同红队演练与桌面推演 6. 数据集与评估共享。由于大多数基础模型开发者是非政府实体,许多信任构建措施需要更广泛利益相关方的参与。这些措施可由人工智能实验室或相关政府行为者实施。