A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Xingjun Ma,Yixu Wang,Hengyuan Xu,Yutao Wu,Yifan Ding,Yunhan Zhao,Zilong Wang,Jiabin Hua,Ming Wen,Jianan Liu,Ranjie Duan,Yifeng Gao,Yingshui Tan,Yunhao Chen,Hui Xue,Xin Wang,Wei Cheng,Jingjing Chen,Zuxuan Wu,Bo Li,Yu-Gang Jiang

from arxiv, 41 pages, 22 figures

The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and generation across language and vision, yet whether these advances translate into comparable improvements in safety remains unclear, partly due to fragmented evaluations that focus on isolated modalities or threat models. In this report, we present an integrated safety evaluation of six frontier models--GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5--assessing each across language, vision-language, and image generation using a unified protocol that combines benchmark, adversarial, multilingual, and compliance evaluations. By aggregating results into safety leaderboards and model profiles, we reveal a highly uneven safety landscape: while GPT-5.2 demonstrates consistently strong and balanced performance, other models exhibit clear trade-offs across benchmark safety, adversarial robustness, multilingual generalization, and regulatory compliance. Despite strong results under standard benchmarks, all models remain highly vulnerable under adversarial testing, with worst-case safety rates dropping below 6%. Text-to-image models show slightly stronger alignment in regulated visual risk categories, yet remain fragile when faced with adversarial or semantically ambiguous prompts. Overall, these findings highlight that safety in frontier models is inherently multidimensional--shaped by modality, language, and evaluation design--underscoring the need for standardized, holistic safety assessments to better reflect real-world risk and guide responsible deployment.

翻译：大语言模型与多模态大语言模型的快速发展，在语言和视觉领域的推理、感知与生成能力上取得了重大进展。然而，这些进步是否转化为安全性的同等提升尚不明确，部分原因在于现有评估体系较为零散，往往只关注单一模态或特定威胁模型。本报告对六个前沿模型——GPT-5.2、Gemini 3 Pro、Qwen3-VL、Grok 4.1 Fast、Nano Banana Pro与Seedream 4.5——进行了综合安全评估。我们采用统一评估协议，结合基准测试、对抗性测试、多语言测试及合规性评估，对每个模型在语言、视觉-语言及图像生成三个维度进行了全面测评。通过将结果汇总至安全排行榜与模型能力剖面图，我们揭示了一个高度不均衡的安全格局：GPT-5.2展现出持续强劲且均衡的性能，而其他模型则在基准安全性、对抗鲁棒性、多语言泛化能力及法规遵从性方面存在明显的权衡取舍。尽管所有模型在标准基准测试中表现良好，但在对抗性测试下仍高度脆弱，最差情况下的安全率降至6%以下。文本到图像模型在受监管的视觉风险类别中表现出稍强的对齐性，但在面对对抗性或语义模糊的提示时依然脆弱。总体而言，这些发现表明前沿模型的安全性本质上是多维度的——受模态、语言和评估设计共同影响——这凸显了建立标准化、整体性安全评估体系的必要性，以更准确地反映现实世界风险并指导负责任的技术部署。

相关内容

安全评估

关注 11

安全评估分狭义和广义二种。狭义指对一个具有特定功能的工作系统中固有的或潜在的危险及其严重程度所进行的分析与评估，并以既定指数、等级或概率值作出定量的表示，最后根据定量值的大小决定采取预防或防护对策。广义指利用系统工程原理和方法对拟建或已有工程、系统可能存在的危险性及其可能产生的后果进行综合评价和预测，并根据可能导致的事故风险的大小，提出相应的安全对策措施，以达到工程、系统安全的过程。安全评估又称风险评估、危险评估，或称安全评价、风险评价和危险评价。

关于 GPT-5.2、Gemini 3 Pro、Qwen3-VL、豆包 1.8、Grok 4.1 Fast、Nano Banana Pro 及 Seedream 4.5 的安全性研究报告

专知会员服务

25+阅读 · 1月18日

2024年中国AI基础数据服务研究报告

专知会员服务

39+阅读 · 2024年7月12日

【博士论文】负责任大型语言模型:安全性、公平性、可信性，142页pdf

专知会员服务

34+阅读 · 2024年1月26日

GPT-4科学发现如何？微软230页长文《大型语言模型对科学发现的影响:使用GPT-4的初步研究》，涵盖5大科学领域，前景可期

专知会员服务

70+阅读 · 2023年11月15日