Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for label-free rare-attribute discovery in diffusion models, requiring no predefined minority categories. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation. The project page is available at https://vssilpa.github.io/RAIGen_webpage/ .
翻译:文本到图像扩散模型尽管实现了令人瞩目的生成质量,但会继承并放大训练数据中的偏差,导致语义属性覆盖不均。以往研究从两个方向予以应对:封闭式方法在预定义公平性类别(如性别、种族)中缓解偏差,假设具有社会敏感性的少数属性是已知的;开放式方法则将任务框架定义为偏差识别,突出主导输出的大多数属性。两者均忽略了一个互补任务:发掘数据分布中代表性不足的稀有或少数特征(涉及社会、文化或风格层面),而此类特征仍被编码在模型表征中。我们提出RAIGen——据我们所知,首个用于扩散模型无标签稀有属性发现的框架,无需预定义少数类别。RAIGen利用俄罗斯套娃稀疏自编码器,结合神经元激活频率与语义独特性的新颖少数度量指标,识别出那些其最高激活图像能揭示不具代表性属性的可解释神经元。实验表明,RAIGen在Stable Diffusion模型中能发现超越固定公平性类别的属性,可扩展至SDXL等更大模型,支持跨架构的系统性审计,并在生成过程中实现对稀有属性的定向增强。项目页面:https://vssilpa.github.io/RAIGen_webpage/ 。