Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is $c$-cyclically monotone, where $c$ is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.
翻译:生成对抗网络(GANs)在生成任务中应用广泛,但其通常需要精心设计网络架构、进行大量经验性调参,且易发生模式崩溃。为克服这些挑战,我们提出一种新颖模型,该模型能够识别底层数据分布的低维结构,将其映射至低维潜在空间并保持其几何结构,进而将参考测度最优传输至嵌入分布。我们证明了该方法的三个关键性质:1)编码器能够保持底层数据的几何结构;2)生成器具有$c$-循环单调性,其中$c$为编码器采用的内在嵌入代价;3)判别器的连续性模随数据几何结构保持程度的提升而改善。数值实验表明,该方法在生成高质量图像方面效果显著,且对模式崩溃和训练不稳定性均表现出良好的鲁棒性。