Latent Diffusion Models (LDMs) produce high-quality, photo-realistic images, however, the latency incurred by multiple costly inference iterations can restrict their applicability. We introduce LatentCRF, a continuous Conditional Random Field (CRF) model, implemented as a neural network layer, that models the spatial and semantic relationships among the latent vectors in the LDM. By replacing some of the computationally-intensive LDM inference iterations with our lightweight LatentCRF, we achieve a superior balance between quality, speed and diversity. We increase inference efficiency by 33% with no loss in image quality or diversity compared to the full LDM. LatentCRF is an easy add-on, which does not require modifying the LDM.
翻译:潜在扩散模型(LDMs)能够生成高质量、照片级逼真的图像,然而,其多次昂贵推理迭代所产生的延迟可能限制其应用范围。本文提出LatentCRF,一种连续条件随机场(CRF)模型,以神经网络层的形式实现,用于建模LDM中潜在向量间的空间与语义关系。通过用我们轻量级的LatentCRF替代部分计算密集的LDM推理迭代,我们在图像质量、生成速度与多样性之间取得了更优的平衡。与完整LDM相比,我们在不损失图像质量或多样性的前提下,将推理效率提升了33%。LatentCRF是一种易于集成的附加组件,无需修改原始LDM。