扩展外部访问前沿AI模型以进行危险能力评估 (Expanding External Access To Frontier AI Models For Dangerous Capability Evaluations)

Frontier AI companies increasingly rely on external evaluations to assess risks from dangerous capabilities before deployment. However, external evaluators often receive limited model access, limited information, and little time, which can reduce evaluation rigour and confidence. The EU General-Purpose AI Code of Practice calls for "appropriate access", but does not specify what this means in practice. Furthermore, there is no common framework for describing different types and levels of evaluator access. To address this gap, we propose a taxonomy of access methods for dangerous capability evaluations. We disentangle three aspects of access: model access, model information, and evaluation timeframe. For each aspect, we review benefits and risks, including how expanding access can reduce false negatives and improve stakeholder trust, but can also increase security and capacity challenges. We argue that these limitations can likely be mitigated through technical means and safeguards used in other industries. Based on the taxonomy, we propose three descriptive access levels: AL1 (black-box model access and minimal information), AL2 (grey-box model access and substantial information), and AL3 (white-box model access and comprehensive information), to support clearer communication between evaluators, frontier AI companies, and policymakers. We believe these levels correspond to the different standards for appropriate access defined in the EU Code of Practice, though these standards may change over time.

翻译：前沿AI公司日益依赖外部评估来部署前评估危险能力带来的风险。然而，外部评估者通常只能获得有限的模型访问权限、有限的信息和较短的时间，这可能会降低评估的严谨性和可信度。欧盟《通用人工智能行为准则》要求提供"适当访问"，但并未具体说明这在实践中意味着什么。此外，目前尚无描述评估者访问类型和级别的通用框架。为填补这一空白，我们提出了一种用于危险能力评估的访问方法分类体系。我们解构了访问的三个维度：模型访问、模型信息和评估时间框架。针对每个维度，我们分析了其益处与风险，包括扩展访问如何减少假阴性结果并提升利益相关方信任，但同时也可能增加安全性和资源方面的挑战。我们认为，这些限制很可能通过其他行业使用的技术手段和保障措施得以缓解。基于该分类体系，我们提出了三个描述性访问级别：AL1（黑盒模型访问与最小信息）、AL2（灰盒模型访问与实质性信息）和AL3（白盒模型访问与全面信息），以支持评估者、前沿AI公司和政策制定者之间更清晰的沟通。我们相信这些级别对应欧盟《行为准则》中定义的不同适当访问标准，尽管这些标准可能会随时间而变化。

相关内容

关注 7093

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《运用人工神经网络的防空系统威胁评估模型》

专知会员服务

11+阅读 · 2月21日

前沿人工智能趋势报告（Frontier AI Trends Report）

专知会员服务

37+阅读 · 2025年12月20日

《人工智能增强监视分析：利用跨网络、陆地、空中及海上领域的威胁向量实时建模》

专知会员服务

27+阅读 · 2025年12月11日

《利用人工智能增强的监视分析在网络、陆地、空中和海上领域实时建模威胁向量》

专知会员服务

19+阅读 · 2025年11月2日