We explore the problem of generating minority samples using diffusion models. The minority samples are instances that lie on low-density regions of a data manifold. Generating sufficient numbers of such minority instances is important, since they often contain some unique attributes of the data. However, the conventional generation process of the diffusion models mostly yields majority samples (that lie on high-density regions of the manifold) due to their high likelihoods, making themselves highly ineffective and time-consuming for the task. In this work, we present a novel framework that can make the generation process of the diffusion models focus on the minority samples. We first provide a new insight on the majority-focused nature of the diffusion models: they denoise in favor of the majority samples. The observation motivates us to introduce a metric that describes the uniqueness of a given sample. To address the inherent preference of the diffusion models w.r.t. the majority samples, we further develop minority guidance, a sampling technique that can guide the generation process toward regions with desired likelihood levels. Experiments on benchmark real datasets demonstrate that our minority guidance can greatly improve the capability of generating the low-likelihood minority samples over existing generative frameworks including the standard diffusion sampler.
翻译:我们探讨了利用扩散模型生成少数样本的问题。少数样本是位于数据流形低密度区域的实例。生成足够数量的此类少数样本具有重要意义,因为它们通常包含数据的某些独特属性。然而,扩散模型的常规生成过程主要生成多数样本(位于流形高密度区域),这是由于它们的高似然性,使得该任务效率低下且耗时。在这项工作中,我们提出了一种新颖框架,能够使扩散模型的生成过程聚焦于少数样本。我们首先对扩散模型偏向多数样本的特性提出了新见解:它们在去噪过程中优先考虑多数样本。这一观察促使我们引入一种度量,用于描述给定样本的唯一性。为解决扩散模型对多数样本的固有偏好,我们进一步开发了少数样本引导,这是一种采样技术,能够将生成过程导向具有期望似然水平的区域。在基准真实数据集上的实验表明,相较于现有生成框架(包括标准扩散采样器),我们的少数样本引导能够显著提升低似然少数样本的生成能力。