Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.
翻译:先进AI模型的安全性与责任评估是一个关键但尚在发展中的研究与实践领域。在Google DeepMind先进AI模型的开发过程中,我们创新并应用了一系列广泛的安全评估方法。本报告总结并分享了我们不断演进的方法框架中的要素,以及面向广大受众的经验教训。主要经验包括:第一,理论基础与框架对于组织广泛的风险领域、模态、形式、指标与目标具有不可替代的价值。第二,安全评估开发的理论与实践均需通过协作来明确目标、方法与挑战,并促进不同利益相关方与学科之间的见解转移。第三,类似的关键方法、经验与制度适用于责任与安全领域的各类关切——包括已明确与新兴的伤害类型。因此,从事安全评估工作的广泛参与者与安全研究社区必须共同开发、完善并实施新型评估方法与最佳实践,而非各自为政。报告最后指出,亟需快速推进评估科学的发展,将新评估方法融入AI的开发与治理,建立基于科学依据的规范与标准,并构建稳健的评估生态系统。