ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

Haibo Tong,Feifei Zhao,Linghao Feng,Ruoyu Wu,Ruolin Chen,Lu Jia,Zhou Zhao,Jindong Li,Tenglong Li,Erliang Lin,Shuai Yang,Enmeng Lu,Yinqian Sun,Qian Zhang,Zizhe Ruan,Zeyang Yue,Ping Wu,Huangrui Li,Chengyi Sun,Yi Zeng

Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alignment technologies can hardly address the complex challenges posed by cutting-edge AI models. To bridge this gap, we propose the "ForesightSafety Bench" AI Safety Evaluation Framework, beginning with 7 major Fundamental Safety pillars and progressively extends to advanced Embodied AI Safety, AI4Science Safety, Social and Environmental AI risks, Catastrophic and Existential Risks, as well as 8 critical industrial safety domains, forming a total of 94 refined risk dimensions. To date, the benchmark has accumulated tens of thousands of structured risk data points and assessment results, establishing a widely encompassing, hierarchically clear, and dynamically evolving AI safety evaluation framework. Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models, identifying key risk patterns and their capability boundaries. The safety capability evaluation results reveals the widespread safety vulnerabilities of frontier AI across multiple pillars, particularly focusing on Risky Agentic Autonomy, AI4Science Safety, Embodied AI Safety, Social AI Safety and Catastrophic and Existential Risks. Our benchmark is released at https://github.com/Beijing-AISI/ForesightSafety-Bench. The project website is available at https://foresightsafety-bench.beijing-aisi.ac.cn/.

翻译：快速演进的人工智能展现出日益强大的自主性与目标导向能力，同时衍生出更具不可预测性、难以控制且可能不可逆转的系统性风险。然而，当前的人工智能安全评估体系存在风险维度受限、前沿风险检测失效等关键局限。滞后的安全基准与对齐技术难以应对尖端人工智能模型带来的复杂挑战。为弥补这一差距，我们提出“前瞻安全基准”人工智能安全评估框架，从7大基础安全支柱出发，逐步延伸至高级具身人工智能安全、AI4Science安全、社会与环境人工智能风险、灾难性与生存性风险，以及8个关键工业安全领域，共形成94个精细化风险维度。截至目前，该基准已积累数万个结构化风险数据点与评估结果，建立起一个覆盖广泛、层次清晰且动态演进的人工智能安全评估框架。基于此基准，我们对二十余个主流先进大模型进行了系统性评估与深入分析，识别出关键风险模式及其能力边界。安全能力评估结果揭示了前沿人工智能在多个支柱上普遍存在的安全脆弱性，尤其聚焦于风险性自主智能体、AI4Science安全、具身人工智能安全、社会人工智能安全以及灾难性与生存性风险。我们的基准发布于 https://github.com/Beijing-AISI/ForesightSafety-Bench。项目网站可通过 https://foresightsafety-bench.beijing-aisi.ac.cn/ 访问。