This paper explores the problem of generative modeling, aiming to simulate diverse examples from an unknown distribution based on observed examples. While recent studies have focused on quantifying the statistical precision of popular algorithms, there is a lack of mathematical evaluation regarding the non-replication of observed examples and the creativity of the generative model. We present theoretical insights into this aspect, demonstrating that the Wasserstein GAN, constrained to left-invertible push-forward maps, generates distributions that avoid replication and significantly deviate from the empirical distribution. Importantly, we show that left-invertibility achieves this without compromising the statistical optimality of the resulting generator. Our most important contribution provides a finite-sample lower bound on the Wasserstein-1 distance between the generative distribution and the empirical one. We also establish a finite-sample upper bound on the distance between the generative distribution and the true data-generating one. Both bounds are explicit and show the impact of key parameters such as sample size, dimensions of the ambient and latent spaces, noise level, and smoothness measured by the Lipschitz constant.
翻译:本文探讨生成建模问题,其目标在于根据观测样本模拟未知分布的多样化实例。尽管近期研究聚焦于量化主流算法的统计精度,但关于观测样本的非复制性及生成模型创造性的数学评估仍属空白。我们在此方面提出理论见解,证明受左可逆前推映射约束的Wasserstein GAN能够生成避免复制且显著偏离经验分布的分布。重要的是,我们证明左可逆性实现这一目标的同时,不会损害生成器的统计最优性。我们的核心贡献在于给出了生成分布与经验分布之间Wasserstein-1距离的有限样本下界,同时建立了生成分布与真实数据生成分布之间距离的有限样本上界。这两个界限均具有显式表达式,并揭示了样本量、环境空间与潜空间维度、噪声水平以及由Lipschitz常数度量的平滑度等关键参数的影响。