Recently, multimodal large language models (MLLMs) have emerged as a unified paradigm for language and image generation. Compared with diffusion models, MLLMs possess a much stronger capability for semantic understanding, enabling them to process more complex textual inputs and comprehend richer contextual meanings. However, this enhanced semantic ability may also introduce new and potentially greater safety risks. Taking diffusion models as a reference point, we systematically analyze and compare the safety risks of emerging MLLMs along two dimensions: unsafe content generation and fake image synthesis. Across multiple unsafe generation benchmark datasets, we observe that MLLMs tend to generate more unsafe images than diffusion models. This difference partly arises because diffusion models often fail to interpret abstract prompts, producing corrupted outputs, whereas MLLMs can comprehend these prompts and generate unsafe content. For current advanced fake image detectors, MLLM-generated images are also notably harder to identify. Even when detectors are retrained with MLLMs-specific data, they can still be bypassed by simply providing MLLMs with longer and more descriptive inputs. Our measurements indicate that the emerging safety risks of the cutting-edge generative paradigm, MLLMs, have not been sufficiently recognized, posing new challenges to real-world safety.
翻译:近期,多模态大语言模型(MLLMs)已成为语言与图像生成的统一范式。与扩散模型相比,MLLMs具备更强的语义理解能力,能够处理更复杂的文本输入并理解更丰富的上下文含义。然而,这种增强的语义能力也可能引入新的、潜在更大的安全风险。以扩散模型为参照,我们从非安全内容生成与虚假图像合成两个维度,系统分析并比较了新兴MLLMs的安全风险。在多个不安全生成基准数据集上,我们观察到MLLMs倾向于生成比扩散模型更多的不安全图像。这种差异部分源于扩散模型常难以解读抽象提示词而产生破损输出,而MLLMs能够理解这些提示词并生成不安全内容。对于当前先进的虚假图像检测器而言,MLLMs生成的图像也明显更难以识别。即使利用MLLMs特定数据重新训练检测器,仅需为MLLMs提供更长且更具描述性的输入,仍可轻易绕过检测。我们的测量结果表明,前沿生成范式MLLMs所呈现的新兴安全风险尚未得到充分认识,这给现实世界安全带来了新的挑战。