Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity.Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size.To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans.The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity, our approach contributes to the creation of more inclusive and representative AI-generated art.
翻译:潜在扩散模型在从文本生成高质量图像方面表现出色。然而,生成的图像缺乏多样性引发了担忧。为解决这一问题,我们提出多样化扩散(Diverse Diffusion),一种超越性别与种族、涵盖更丰富领域(包括颜色多样性)来增强图像多样性的方法。多样化扩散是一种通用的无监督技术,可应用于现有的文本到图像模型。我们的方法侧重于在稳定扩散(Stable Diffusion)潜在空间中寻找彼此距离较远的向量。我们生成多个潜在空间向量,直至找到一组满足所需距离要求及所需批次大小的向量。为评估我们多样性方法的有效性,我们开展了实验,考察了包括颜色多样性、LPIPS度量以及包含人物的图像中种族/性别表示等多种特性。实验结果强调了多样性在生成逼真且多样图像中的重要性,为改进文本到图像模型提供了宝贵见解。通过增强图像多样性,我们的方法有助于创造更具包容性和代表性的AI生成艺术。