RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for un-supervised rare-attribute discovery in diffusion models. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation.

翻译：文本到图像扩散模型实现了令人印象深刻的生成质量，但也继承并放大了训练数据中的偏见，导致对语义属性的覆盖出现偏差。先前的研究通过两种方式解决这一问题。闭集方法缓解预定义公平性类别（如性别、种族）中的偏见，其假设社会显著性少数属性是事先已知的。开集方法则将任务构建为偏见识别，突出主导输出的多数属性。两者都忽视了一个互补的任务：揭示在数据分布中代表性不足（社会、文化或风格上的）、但仍编码在模型表示中的稀有或少数特征。我们提出了RAIGen，据我们所知，这是首个用于在扩散模型中无监督发现稀有属性的框架。RAIGen利用套娃稀疏自编码器以及一种结合神经元激活频率与语义独特性的新颖少数度量，来识别可解释的神经元，其最高激活图像能揭示代表性不足的属性。实验表明，RAIGen能在Stable Diffusion中发现超出固定公平性类别的属性，可扩展至SDXL等更大模型，支持跨架构的系统性审计，并能在生成过程中实现稀有属性的定向增强。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

IMAGINE-E：最先进文本到图像模型的图像生成智能评估

专知会员服务

13+阅读 · 2025年2月3日

【CVPR2024】OpenBias: 文本到图像生成模型中的开放集偏见检测

专知会员服务

15+阅读 · 2024年4月14日

【CVPR2024】用于文本到图像生成的判别性探测和调整

专知会员服务

15+阅读 · 2024年3月11日