ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

Haibo Tong,Feifei Zhao,Linghao Feng,Ruoyu Wu,Ruolin Chen,Lu Jia,Zhou Zhao,Jindong Li,Tenglong Li,Erliang Lin,Shuai Yang,Enmeng Lu,Yinqian Sun,Qian Zhang,Zizhe Ruan,Jinyu Fan,Zeyang Yue,Ping Wu,Huangrui Li,Chengyi Sun,Yi Zeng

Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alignment technologies can hardly address the complex challenges posed by cutting-edge AI models. To bridge this gap, we propose the "ForesightSafety Bench" AI Safety Evaluation Framework, beginning with 7 major Fundamental Safety pillars and progressively extends to advanced Embodied AI Safety, AI4Science Safety, Social and Environmental AI risks, Catastrophic and Existential Risks, as well as 8 critical industrial safety domains, forming a total of 94 refined risk dimensions. To date, the benchmark has accumulated tens of thousands of structured risk data points and assessment results, establishing a widely encompassing, hierarchically clear, and dynamically evolving AI safety evaluation framework. Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models, identifying key risk patterns and their capability boundaries. The safety capability evaluation results reveals the widespread safety vulnerabilities of frontier AI across multiple pillars, particularly focusing on Risky Agentic Autonomy, AI4Science Safety, Embodied AI Safety, Social AI Safety and Catastrophic and Existential Risks. Our benchmark is released at https://github.com/Beijing-AISI/ForesightSafety-Bench. The project website is available at https://foresightsafety-bench.beijing-aisi.ac.cn/.

翻译：快速演进的人工智能展现出日益增强的自主性与目标导向能力，同时伴生着更具不可预测性、难以控制且可能不可逆转的衍生系统性风险。然而，当前的人工智能安全评估体系存在关键局限，例如风险维度受限以及前沿风险检测失效。滞后的安全基准与对齐技术难以应对尖端人工智能模型带来的复杂挑战。为弥合这一差距，我们提出了“前瞻安全基准”人工智能安全评估框架，该框架始于7大基础安全支柱，并逐步扩展至高级具身人工智能安全、AI4Science安全、社会与环境人工智能风险、灾难性与生存性风险，以及8个关键工业安全领域，共计形成94个细化的风险维度。迄今为止，该基准已积累数万个结构化风险数据点与评估结果，建立起一个覆盖广泛、层次清晰且动态演进的人工智能安全评估框架。基于此基准，我们对二十余个主流先进大模型进行了系统性评估与深入分析，识别出关键风险模式及其能力边界。安全能力评估结果揭示了前沿人工智能在多个支柱上普遍存在的安全脆弱性，尤其聚焦于风险性自主智能体、AI4Science安全、具身人工智能安全、社会人工智能安全以及灾难性与生存性风险。我们的基准发布于 https://github.com/Beijing-AISI/ForesightSafety-Bench。项目网站可通过 https://foresightsafety-bench.beijing-aisi.ac.cn/ 访问。