Rapidly evolving AI exhibits increasingly strong autonomy and goal-directed capabilities, accompanied by derivative systemic risks that are more unpredictable, difficult to control, and potentially irreversible. However, current AI safety evaluation systems suffer from critical limitations such as restricted risk dimensions and failed frontier risk detection. The lagging safety benchmarks and alignment technologies can hardly address the complex challenges posed by cutting-edge AI models. To bridge this gap, we propose the "ForesightSafety Bench" AI Safety Evaluation Framework, beginning with 7 major Fundamental Safety pillars and progressively extends to advanced Embodied AI Safety, AI4Science Safety, Social and Environmental AI risks, Catastrophic and Existential Risks, as well as 8 critical industrial safety domains, forming a total of 94 refined risk dimensions. To date, the benchmark has accumulated tens of thousands of structured risk data points and assessment results, establishing a widely encompassing, hierarchically clear, and dynamically evolving AI safety evaluation framework. Based on this benchmark, we conduct systematic evaluation and in-depth analysis of over twenty mainstream advanced large models, identifying key risk patterns and their capability boundaries. The safety capability evaluation results reveals the widespread safety vulnerabilities of frontier AI across multiple pillars, particularly focusing on Risky Agentic Autonomy, AI4Science Safety, Embodied AI Safety, Social AI Safety and Catastrophic and Existential Risks. Our benchmark is released at https://github.com/Beijing-AISI/ForesightSafety-Bench. The project website is available at https://foresightsafety-bench.beijing-aisi.ac.cn/.
翻译:快速演进的人工智能展现出日益强大的自主性和目标导向能力,随之而来的是更具不可预测性、难以控制且可能不可逆转的衍生系统性风险。然而,当前的人工智能安全评估体系存在风险维度受限、前沿风险检测失效等关键局限。滞后的安全基准与对齐技术难以应对尖端AI模型带来的复杂挑战。为弥补这一差距,我们提出了“前瞻安全基准”AI安全评估框架,该框架从7大基础安全支柱出发,逐步延伸至高级具身AI安全、AI4Science安全、社会与环境AI风险、灾难性与生存性风险,以及8个关键工业安全领域,共形成94个细化的风险维度。截至目前,该基准已积累数万个结构化风险数据点与评估结果,构建了一个覆盖广泛、层次清晰且动态演进的人工智能安全评估框架。基于此基准,我们对二十余个主流先进大模型进行了系统评估与深入分析,识别出关键风险模式及其能力边界。安全能力评估结果揭示了前沿AI在多个支柱上普遍存在的安全脆弱性,尤其聚焦于风险性自主智能体、AI4Science安全、具身AI安全、社会AI安全以及灾难性与生存性风险。我们的基准发布于 https://github.com/Beijing-AISI/ForesightSafety-Bench。项目网站可通过 https://foresightsafety-bench.beijing-aisi.ac.cn/ 访问。