Stable Diffusion XL (SDXL) has become the best open source text-to-image model (T2I) for its versatility and top-notch image quality. Efficiently addressing the computational demands of SDXL models is crucial for wider reach and applicability. In this work, we introduce two scaled-down variants, Segmind Stable Diffusion (SSD-1B) and Segmind-Vega, with 1.3B and 0.74B parameter UNets, respectively, achieved through progressive removal using layer-level losses focusing on reducing the model size while preserving generative quality. We release these models weights at https://hf.co/Segmind. Our methodology involves the elimination of residual networks and transformer blocks from the U-Net structure of SDXL, resulting in significant reductions in parameters, and latency. Our compact models effectively emulate the original SDXL by capitalizing on transferred knowledge, achieving competitive results against larger multi-billion parameter SDXL. Our work underscores the efficacy of knowledge distillation coupled with layer-level losses in reducing model size while preserving the high-quality generative capabilities of SDXL, thus facilitating more accessible deployment in resource-constrained environments.
翻译:稳定扩散XL(SDXL)凭借其多功能性和一流图像质量,已成为最佳开源文本到图像模型(T2I)。有效应对SDXL模型的计算需求对于其更广泛的应用至关重要。本研究提出了两种精简变体——Segmind稳定扩散(SSD-1B)和Segmind-Vega,分别采用13亿和7.4亿参数的UNet架构,通过基于层级损失的渐进式剪枝实现模型尺寸缩减的同时保持生成质量。我们在https://hf.co/Segmind发布这些模型权重。该方法通过移除SDXL的U-Net结构中的残差网络与Transformer模块,显著降低参数规模与推理延迟。我们的精简模型通过利用迁移知识有效模仿原始SDXL,在与数十亿参数级别的SDXL对比中表现优异。本研究验证了知识蒸馏结合层级损失在保持SDXL高质量生成能力的同时缩小模型尺寸的有效性,从而促进其在资源受限环境中的可部署性。