Diffusion models are a powerful class of generative models that can produce high-quality images, but they may suffer from data bias. Data bias occurs when the training data does not reflect the true distribution of the data domain, but rather exhibits some skewed or imbalanced patterns. For example, the CelebA dataset contains more female images than male images, which can lead to biased generation results and affect downstream applications. In this paper, we propose a novel method to mitigate data bias in diffusion models by applying manifold guidance. Our key idea is to estimate the manifold of the training data using a learnable information-theoretic approach, and then use it to guide the sampling process of diffusion models. In this way, we can encourage the generated images to be uniformly distributed on the data manifold, without changing the model architecture or requiring labels or retraining. We provide theoretical analysis and empirical evidence to show that our method can improve the quality and unbiasedness of image generation compared to standard diffusion models.
翻译:扩散模型是一类能够生成高质量图像的强大生成模型,但可能受到数据偏差的影响。数据偏差是指训练数据未能反映数据域的真实分布,而是呈现出某种偏斜或不平衡的模式。例如,CelebA数据集中女性图像数量多于男性图像,这可能导致生成结果产生偏差,并影响下游应用。在本文中,我们提出了一种通过应用流形引导来减轻扩散模型中数据偏差的新方法。我们的核心思想是利用可学习的信息论方法估计训练数据的流形,然后将其用于引导扩散模型的采样过程。通过这种方式,我们能够在不改变模型架构、无需标签或重新训练的情况下,鼓励生成的图像在数据流形上均匀分布。我们提供了理论分析和实验证据,表明与标准扩散模型相比,我们的方法能够提升图像生成的质量和无偏性。