This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard's performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework publicly available, including the dataset and model weights.
翻译:本文介绍了LlavaGuard,一套基于视觉语言模型(VLM)的视觉安全防护组件,旨在应对大规模数据与模型时代对可靠安全护栏的迫切需求。为此,我们建立了一个新颖的开放框架,描述了可定制的安全分类体系、数据预处理、增强及训练设置。为了教导VLM防护模型理解安全性,我们进一步创建了一个包含高质量人类专家标注的多模态安全数据集,其中每张图像都标注了安全等级、类别及判定依据。我们还采用了先进的增强技术以支持特定情境下的评估。由此产生的LlavaGuard模型,参数量从0.5B到7B不等,可作为评估视觉内容是否符合灵活安全策略的多功能工具。在全面的实验中,LlavaGuard在准确性和灵活处理不同策略方面均优于最先进的安全防护方案及现有VLM。此外,我们展示了LlavaGuard在两个实际应用中的表现:大规模数据集标注以及文本到图像模型的生成内容审核。我们将整个框架(包括数据集和模型权重)公开提供。