We introduce Llama Guard 3 Vision, a multimodal LLM-based safeguard for human-AI conversations that involves image understanding: it can be used to safeguard content for both multimodal LLM inputs (prompt classification) and outputs (response classification). Unlike the previous text-only Llama Guard versions (Inan et al., 2023; Llama Team, 2024b,a), it is specifically designed to support image reasoning use cases and is optimized to detect harmful multimodal (text and image) prompts and text responses to these prompts. Llama Guard 3 Vision is fine-tuned on Llama 3.2-Vision and demonstrates strong performance on the internal benchmarks using the MLCommons taxonomy. We also test its robustness against adversarial attacks. We believe that Llama Guard 3 Vision serves as a good starting point to build more capable and robust content moderation tools for human-AI conversation with multimodal capabilities.
翻译:我们推出Llama Guard 3 Vision,这是一种基于多模态大语言模型的安全防护机制,专为涉及图像理解的人机对话设计:它可用于保护多模态大语言模型的输入(提示分类)和输出(响应分类)内容。与此前仅支持文本的Llama Guard版本(Inan等人,2023;Llama团队,2024b,a)不同,它专门为支持图像推理用例而设计,并经过优化以检测有害的多模态(文本和图像)提示以及对这些提示的文本响应。Llama Guard 3 Vision基于Llama 3.2-Vision进行微调,在使用MLCommons分类法的内部基准测试中表现出色。我们还测试了其对抗攻击的鲁棒性。我们相信,Llama Guard 3 Vision为构建更强大、更鲁棒的、具备多模态能力的人机对话内容审核工具提供了一个良好的起点。