LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models

This paper introduces LlavaGuard, a suite of VLM-based vision safeguards that address the critical need for reliable guardrails in the era of large-scale data and models. To this end, we establish a novel open framework, describing a customizable safety taxonomy, data preprocessing, augmentation, and training setup. For teaching a VLM safeguard on safety, we further create a multimodal safety dataset with high-quality human expert annotations, where each image is labeled with a safety rating, category and rationale. We also employ advanced augmentations to support context-specific assessments. The resulting LlavaGuard models, ranging from 0.5B to 7B, serve as a versatile tool for evaluating the safety compliance of visual content against flexible policies. In comprehensive experiments, LlavaGuard outperforms both state-of-the-art safeguards and VLMs in accuracy and in flexibly handling different policies. Additionally, we demonstrate LlavaGuard's performance in two real-world applications: large-scale dataset annotation and moderation of text-to-image models. We make our entire framework publicly available, including the dataset and model weights.

翻译：本文介绍了LlavaGuard，一套基于视觉语言模型（VLM）的视觉安全防护组件，旨在应对大规模数据与模型时代对可靠安全护栏的迫切需求。为此，我们建立了一个新颖的开放框架，描述了可定制的安全分类体系、数据预处理、增强及训练设置。为了教导VLM防护模型理解安全性，我们进一步创建了一个包含高质量人类专家标注的多模态安全数据集，其中每张图像都标注了安全等级、类别及判定依据。我们还采用了先进的增强技术以支持特定情境下的评估。由此产生的LlavaGuard模型，参数量从0.5B到7B不等，可作为评估视觉内容是否符合灵活安全策略的多功能工具。在全面的实验中，LlavaGuard在准确性和灵活处理不同策略方面均优于最先进的安全防护方案及现有VLM。此外，我们展示了LlavaGuard在两个实际应用中的表现：大规模数据集标注以及文本到图像模型的生成内容审核。我们将整个框架（包括数据集和模型权重）公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日