Generative models based on latent variables, such as generative adversarial networks (GANs) and variational auto-encoders (VAEs), have gained lots of interests due to their impressive performance in many fields. However, many data such as natural images usually do not populate the ambient Euclidean space but instead reside in a lower-dimensional manifold. Thus an inappropriate choice of the latent dimension fails to uncover the structure of the data, possibly resulting in mismatch of latent representations and poor generative qualities. Towards addressing these problems, we propose a novel framework called the latent Wasserstein GAN (LWGAN) that fuses the Wasserstein auto-encoder and the Wasserstein GAN so that the intrinsic dimension of the data manifold can be adaptively learned by a modified informative latent distribution. We prove that there exist an encoder network and a generator network in such a way that the intrinsic dimension of the learned encoding distribution is equal to the dimension of the data manifold. We theoretically establish that our estimated intrinsic dimension is a consistent estimate of the true dimension of the data manifold. Meanwhile, we provide an upper bound on the generalization error of LWGAN, implying that we force the synthetic data distribution to be similar to the real data distribution from a population perspective. Comprehensive empirical experiments verify our framework and show that LWGAN is able to identify the correct intrinsic dimension under several scenarios, and simultaneously generate high-quality synthetic data by sampling from the learned latent distribution.
翻译:基于潜在变量的生成模型,例如生成对抗网络(GANs)和变分自编码器(VAEs),因其在诸多领域的卓越性能而备受关注。然而,许多数据(如自然图像)通常并不充满于环境欧几里得空间,而是存在于一个低维流形中。因此,潜在维度选择不当将无法揭示数据的结构,可能导致潜在表示失配和生成质量低下。为解决这些问题,我们提出了一种名为潜在Wasserstein生成对抗网络(LWGAN)的新框架,该框架融合了Wasserstein自编码器和Wasserstein生成对抗网络,从而能够通过改进的信息化潜在分布自适应地学习数据流形的内在维度。我们证明,存在一个编码器网络和一个生成器网络,使得所学编码分布的内在维度等于数据流形的维度。我们从理论上建立了所估计的内在维度是数据流形真实维度的一致性估计。同时,我们给出了LWGAN泛化误差的上界,这意味着我们从总体角度迫使合成数据分布与真实数据分布相似。全面的实证实验验证了我们的框架,并表明LWGAN能够在多种场景下识别正确的内在维度,同时通过从所学潜在分布中采样生成高质量的合成数据。