ForesightSafety Bench: A Frontier Risk Evaluation and Governance Framework towards Safe AI

Haibo Tong,Feifei Zhao,Linghao Feng,Ruoyu Wu,Ruolin Chen,Lu Jia,Zhou Zhao,Jindong Li,Tenglong Li,Erliang Lin,Shuai Yang,Enmeng Lu,Yinqian Sun,Qian Zhang,Zizhe Ruan,Jinyu Fan,Zeyang Yue,Ping Wu,Huangrui Li,Chengyi Sun,Yi Zeng

Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alignment technologies can hardly address the complex challenges posed by cutting-edge AI models. To bridge this gap, we propose the "ForesightSafety Bench" AI Safety Evaluation Framework, beginning with 7 major Fundamental Safety pillars and progressively extends to advanced Embodied AI Safety, AI4Science Safety, Social and Environmental AI risks, Catastrophic and Existential Risks, as well as 8 critical industrial safety domains, forming a total of 94 refined risk dimensions. To date, the benchmark has accumulated tens of thousands of structured risk data points and assessment results, establishing a widely encompassing, hierarchically clear, and dynamically evolving AI safety evaluation framework. Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models, identifying key risk patterns and their capability boundaries. The safety capability evaluation results reveals the widespread safety vulnerabilities of frontier AI across multiple pillars, particularly focusing on Risky Agentic Autonomy, AI4Science Safety, Embodied AI Safety, Social AI Safety and Catastrophic and Existential Risks. Our benchmark is released at https://github.com/Beijing-AISI/ForesightSafety-Bench. The project website is available at https://foresightsafety-bench.beijing-aisi.ac.cn/.

翻译：快速演进的人工智能展现出日益强大的自主性和目标导向能力，随之而来的是更具不可预测性、难以控制且可能不可逆转的衍生系统性风险。然而，当前的人工智能安全评估体系存在风险维度受限、前沿风险检测失效等关键局限。滞后的安全基准与对齐技术难以应对尖端AI模型带来的复杂挑战。为弥补这一差距，我们提出了“前瞻安全基准”AI安全评估框架，该框架从7大基础安全支柱出发，逐步延伸至高级具身AI安全、AI4Science安全、社会与环境AI风险、灾难性与生存性风险，以及8个关键工业安全领域，共形成94个细化的风险维度。截至目前，该基准已积累数万个结构化风险数据点与评估结果，构建了一个覆盖广泛、层次清晰且动态演进的人工智能安全评估框架。基于此基准，我们对二十余个主流先进大模型进行了系统评估与深入分析，识别出关键风险模式及其能力边界。安全能力评估结果揭示了前沿AI在多个支柱上普遍存在的安全脆弱性，尤其聚焦于风险性自主智能体、AI4Science安全、具身AI安全、社会AI安全以及灾难性与生存性风险。我们的基准发布于 https://github.com/Beijing-AISI/ForesightSafety-Bench。项目网站可通过 https://foresightsafety-bench.beijing-aisi.ac.cn/ 访问。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

前沿人工智能趋势报告（Frontier AI Trends Report）

专知会员服务

37+阅读 · 2025年12月20日

《强大人工智能世界中维护安全：未来国防架构的考量》

专知会员服务

18+阅读 · 2025年11月28日

《人工智能安全标准体系（V1.0）》（征求意见稿）

专知会员服务

29+阅读 · 2025年3月23日

中国信通院发布《人工智能风险治理报告（2024年）》

专知会员服务

47+阅读 · 2024年12月26日