Energy-based models (EBMs) are known in the Machine Learning community for decades. Since the seminal works devoted to EBMs dating back to the noughties, there have been a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN-based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present a novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. From the theoretical perspective, we prove generalization bounds for our technique. In practice, we validate its applicability in toy 2D and image domains. To showcase the scalability, we empower our method with a pre-trained StyleGAN and apply it to high-res AFHQ $512\times 512$ unpaired I2I translation. For simplicity, we choose simple short- and long-run EBMs as a backbone of our Energy-guided Entropic OT approach, leaving the application of more sophisticated EBMs for future research. Our code is publicly available.
翻译:能量基模型(EBMs)在机器学习领域已存在数十年。自本世纪初关于EBMs的开创性工作以来,已涌现出诸多利用能量势(未归一化似然函数)高效解决生成建模问题的方法。相比之下,最优传输(OT)领域(尤其是神经OT求解器)的研究仍较为有限,仅有少数近期工作涉及(不包括基于WGAN的方法——这类方法仅将OT作为损失函数使用,并未对OT映射本身进行建模)。本研究弥合了EBMs与熵正则化OT之间的鸿沟。我们提出了一种新方法,能够利用EBMs领域的最新进展和技术改进来丰富OT领域。从理论角度,我们证明了所提技术的泛化界。在实践层面,我们通过二维玩具数据和图像域验证了其适用性。为展示可扩展性,我们使用预训练的StyleGAN增强该方法,并将其应用于高分辨率AFHQ $512\times 512$ 非配对图像到图像翻译任务。为简化实现,我们选择简单的短程与长程EBMs作为能量引导熵OT框架的骨干结构,将更复杂EBMs的应用留待未来研究。我们的代码已开源。