As AI systems advance, AI evaluations are becoming an important pillar of regulations for ensuring safety. We argue that such regulation should require developers to explicitly identify and justify key underlying assumptions about evaluations as part of their case for safety. We identify core assumptions in AI evaluations (both for evaluating existing models and forecasting future models), such as comprehensive threat modeling, proxy task validity, and adequate capability elicitation. Many of these assumptions cannot currently be well justified. If regulation is to be based on evaluations, it should require that AI development be halted if evaluations demonstrate unacceptable danger or if these assumptions are inadequately justified. Our presented approach aims to enhance transparency in AI development, offering a practical path towards more effective governance of advanced AI systems.
翻译:随着人工智能系统的发展,AI评估正成为确保安全性的监管体系的重要支柱。我们认为,此类监管应要求开发者在论证其系统安全性时,明确识别并论证评估所依赖的关键基本假设。我们识别了AI评估中的核心假设(包括对现有模型的评估和对未来模型的预测),例如全面的威胁建模、代理任务有效性以及充分的能力激发。目前许多此类假设尚无法得到充分论证。若监管要以评估为基础,则应要求在评估显示不可接受的风险或这些假设论证不足时,暂停AI开发。我们提出的方法旨在提升AI开发的透明度,为更有效治理先进AI系统提供一条可行路径。