The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date (under 12 seconds for Stable Diffusion 1.4 without int8 quantization on Samsung S23 Ultra for a 512x512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
翻译:基础模型的快速发展和应用彻底改变了人工智能领域。大规模扩散模型因其生成逼真图像和支持多种任务的能力而备受关注。将这些模型部署在设备端可降低服务器成本、支持离线功能并增强用户隐私保护。然而,常见的大规模扩散模型参数超过10亿,受限于设备有限的计算和内存资源。我们提出了一系列大规模扩散模型的实现优化方案,在配备GPU的移动设备上实现了当前最快的推理延迟记录(三星S23 Ultra运行Stable Diffusion 1.4,无需int8量化,生成512×512图像迭代20次仅需12秒以内)。这些优化拓宽了生成式AI的适用范围,并显著提升了各类设备上的用户体验。