Unsafe2Safe: Controllable Image Anonymization for Downstream Utility

Large-scale image datasets frequently contain identifiable or sensitive content, raising privacy risks when training models that may memorize and leak such information. We present Unsafe2Safe, a fully automated pipeline that detects privacy-prone images and rewrites only their sensitive regions using multimodally guided diffusion editing. Unsafe2Safe operates in two stages. Stage 1 uses a vision-language model to (i) inspect images for privacy risks, (ii) generate paired private and public captions that respectively include and omit sensitive attributes, and (iii) prompt a large language model to produce structured, identity-neutral edit instructions conditioned on the public caption. Stage 2 employs instruction-driven diffusion editors to apply these dual textual prompts, producing privacy-safe images that preserve global structure and task-relevant semantics while neutralizing private content. To measure anonymization quality, we introduce a unified evaluation suite covering Quality, Cheating, Privacy, and Utility dimensions. Across MS-COCO, Caltech101, and MIT Indoor67, Unsafe2Safe reduces face similarity, text similarity, and demographic predictability by large margins, while maintaining downstream model accuracy comparable to training on raw data. Fine-tuning diffusion editors on our automatically generated triplets (private caption, public caption, edit instruction) further improves both privacy protection and semantic fidelity. Unsafe2Safe provides a scalable, principled solution for constructing large, privacy-safe datasets without sacrificing visual consistency or downstream utility.

翻译：大规模图像数据集常包含可识别或敏感内容，在训练可能记忆并泄露此类信息的模型时引发隐私风险。我们提出Unsafe2Safe——一种全自动流水线，用于检测隐私敏感图像并仅重写其敏感区域，采用多模态引导的扩散编辑方法。Unsafe2Safe包含两阶段：阶段1利用视觉语言模型（i）检测图像的隐私风险，（ii）生成成对的私有描述与公共描述（分别包含与省略敏感属性），（iii）引导大语言模型基于公共描述生成结构化、身份中性的编辑指令。阶段2采用指令驱动的扩散编辑器应用上述双文本提示，生成保留全局结构与任务相关语义、同时中和私有内容的隐私安全图像。为衡量匿名化质量，我们引入统一评估套件，涵盖质量、欺骗性、隐私与效用四个维度。在MS-COCO、Caltech101及MIT Indoor67数据集上，Unsafe2Safe大幅降低人脸相似度、文本相似度及人口统计可预测性，同时将下游模型精度维持在接近原始数据训练的水平。基于自动生成的三元组（私有描述、公共描述、编辑指令）微调扩散编辑器，可进一步提升隐私保护能力与语义保真度。Unsafe2Safe为构建大规模隐私安全数据集提供了可扩展、有原则的解决方案，且不牺牲视觉一致性或下游效用。