SustainDiffusion: Optimising the Social and Environmental Sustainability of Stable Diffusion Models

Background: Text-to-image generation models are widely used across numerous domains. Among these models, Stable Diffusion (SD) - an open-source text-to-image generation model - has become the most popular, producing over 12 billion images annually. However, the widespread use of these models raises concerns regarding their social and environmental sustainability. Aims: To reduce the harm that SD models may have on society and the environment, we introduce SustainDiffusion, a search-based approach designed to enhance the social and environmental sustainability of SD models. Method: SustainDiffusion searches the optimal combination of hyperparameters and prompt structures that can reduce gender and ethnic bias in generated images while also lowering the energy consumption required for image generation. Importantly, SustainDiffusion maintains image quality comparable to that of the original SD model. Results: We conduct a comprehensive empirical evaluation of SustainDiffusion, testing it against six different baselines using 56 different prompts. Our results demonstrate that SustainDiffusion can reduce gender bias in SD3 by 68%, ethnic bias by 59%, and energy consumption (calculated as the sum of CPU and GPU energy) by 48%. Additionally, the outcomes produced by SustainDiffusion are consistent across multiple runs and can be generalised to various prompts. Conclusions: With SustainDiffusion, we demonstrate how enhancing the social and environmental sustainability of text-to-image generation models is possible without fine-tuning or changing the model's architecture.

翻译：背景：文本到图像生成模型在众多领域被广泛使用。在这些模型中，Stable Diffusion（SD）——一种开源的文本到图像生成模型——已成为最受欢迎的模型，每年生成超过120亿张图像。然而，这些模型的广泛使用引发了对其社会和环境可持续性的担忧。目标：为减少SD模型可能对社会和环境造成的危害，我们提出了SustainDiffusion，一种基于搜索的方法，旨在提升SD模型的社会和环境可持续性。方法：SustainDiffusion搜索最优的超参数和提示结构组合，以减少生成图像中的性别和种族偏见，同时降低图像生成所需的能耗。重要的是，SustainDiffusion保持了与原始SD模型相当的图像质量。结果：我们对SustainDiffusion进行了全面的实证评估，使用56种不同的提示与六个不同的基线模型进行对比测试。结果表明，SustainDiffusion能将SD3的性别偏见降低68%，种族偏见降低59%，能耗（计算为CPU和GPU能耗之和）降低48%。此外，SustainDiffusion的输出结果在多次运行中保持一致，并能推广到各种提示。结论：通过SustainDiffusion，我们证明了在不进行微调或改变模型架构的情况下，提升文本到图像生成模型的社会和环境可持续性是可行的。