LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

Advances in latent diffusion models (LDMs) have revolutionized high-resolution image generation, but the design space of the autoencoder that is central to these systems remains underexplored. In this paper, we introduce LiteVAE, a family of autoencoders for LDMs that leverage the 2D discrete wavelet transform to enhance scalability and computational efficiency over standard variational autoencoders (VAEs) with no sacrifice in output quality. We also investigate the training methodologies and the decoder architecture of LiteVAE and propose several enhancements that improve the training dynamics and reconstruction quality. Our base LiteVAE model matches the quality of the established VAEs in current LDMs with a six-fold reduction in encoder parameters, leading to faster training and lower GPU memory requirements, while our larger model outperforms VAEs of comparable complexity across all evaluated metrics (rFID, LPIPS, PSNR, and SSIM).

翻译：潜在扩散模型（LDMs）的进展已彻底改变了高分辨率图像生成，但作为这些系统核心的自编码器设计空间仍未得到充分探索。本文提出LiteVAE，这是一个面向LDMs的自编码器系列，其利用二维离散小波变换在保持输出质量不变的前提下，相比标准变分自编码器（VAEs）显著提升了可扩展性与计算效率。我们同时研究了LiteVAE的训练方法与解码器架构，并提出了多项能改善训练动态和重建质量的增强技术。我们的基础LiteVAE模型在编码器参数量减少六倍的情况下，达到了当前LDMs中成熟VAEs的质量水平，从而实现了更快的训练速度与更低的GPU内存需求；而我们的更大模型则在所有评估指标（rFID、LPIPS、PSNR和SSIM）上均优于复杂度相当的VAEs。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日