Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models for accelerating high-resolution diffusion models. Existing autoencoder models have demonstrated impressive results at a moderate spatial compression ratio (e.g., 8x), but fail to maintain satisfactory reconstruction accuracy for high spatial compression ratios (e.g., 64x). We address this challenge by introducing two key techniques: (1) Residual Autoencoding, where we design our models to learn residuals based on the space-to-channel transformed features to alleviate the optimization difficulty of high spatial-compression autoencoders; (2) Decoupled High-Resolution Adaptation, an efficient decoupled three-phases training strategy for mitigating the generalization penalty of high spatial-compression autoencoders. With these designs, we improve the autoencoder's spatial compression ratio up to 128 while maintaining the reconstruction quality. Applying our DC-AE to latent diffusion models, we achieve significant speedup without accuracy drop. For example, on ImageNet 512x512, our DC-AE provides 19.1x inference speedup and 17.9x training speedup on H100 GPU for UViT-H while achieving a better FID, compared with the widely used SD-VAE-f8 autoencoder. Our code is available at https://github.com/mit-han-lab/efficientvit.

翻译：本文提出深度压缩自编码器（DC-AE），这是一种用于加速高分辨率扩散模型的新型自编码器模型系列。现有自编码器模型在中等空间压缩比（例如8倍）下已展现出令人印象深刻的结果，但在高空间压缩比（例如64倍）下难以保持令人满意的重建精度。我们通过引入两项关键技术应对这一挑战：（1）残差自编码：我们设计模型基于空间-通道变换后的特征学习残差，以缓解高空间压缩自编码器的优化困难；（2）解耦式高分辨率适配：一种高效的三阶段解耦训练策略，用于减轻高空间压缩自编码器的泛化损失。通过上述设计，我们将自编码器的空间压缩比提升至128倍，同时保持重建质量。将DC-AE应用于潜在扩散模型后，我们在不损失精度的前提下实现了显著加速。例如，在ImageNet 512×512数据集上，相较于广泛使用的SD-VAE-f8自编码器，我们的DC-AE在H100 GPU上为UViT-H模型提供了19.1倍推理加速与17.9倍训练加速，同时获得了更优的FID分数。代码已开源：https://github.com/mit-han-lab/efficientvit。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日