Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density estimation, and the reverse pass generates new samples from this space. This characteristic creates an intrinsic synergy between representation learning and data generation. However, the generative quality of standard NFs is limited by poor semantic representations from log-likelihood optimization. To remedy this, we propose a novel alignment strategy that creatively leverages the invertibility of NFs: instead of regularizing the forward pass, we align the intermediate features of the generative (reverse) pass with representations from a powerful vision foundation model, demonstrating superior effectiveness over naive alignment. We also introduce a novel training-free, test-time optimization algorithm for classification, which provides a more intrinsic evaluation of the NF's embedded semantic knowledge. Comprehensive experiments demonstrate that our approach accelerates the training of NFs by over 3.3$\times$, while simultaneously delivering significant improvements in both generative quality and classification accuracy. New state-of-the-art results for NFs are established on ImageNet 64$\times$64 and 256$\times$256. Our code is available at https://github.com/MCG-NJU/FlowBack.
翻译:标准化流是一类以数学可逆架构为特征的生成模型,其中前向传递将数据转换到潜在空间进行密度估计,而反向传递则从该空间生成新样本。这一特性在表示学习与数据生成之间创造了内在的协同效应。然而,标准标准化流的生成质量受限于基于对数似然优化获得的低质量语义表示。为改善此问题,我们提出一种新颖的对齐策略,创造性地利用标准化流的可逆性:不同于常规的前向传递正则化方法,我们将生成(反向)过程的中间特征与来自强大视觉基础模型的表示进行对齐,实验证明该方法相较于朴素对齐策略具有显著优越性。同时,我们提出一种无需训练、基于测试时优化的新型分类算法,为评估标准化流内嵌的语义知识提供了更本质的评估方式。综合实验表明,我们的方法将标准化流的训练速度提升超过3.3倍,同时在生成质量与分类准确率方面均实现显著改进。我们在ImageNet 64×64和256×256数据集上取得了标准化流模型的最新最优结果。代码已开源:https://github.com/MCG-NJU/FlowBack。