In this study, we focus on sampling from the latent space of generative models built upon autoencoders so as the reconstructed samples are lifelike images. To do to, we introduce a novel post-training sampling algorithm rooted in the concept of probability mass functions, coupled with a quantization process. Our proposed algorithm establishes a vicinity around each latent vector from the input data and then proceeds to draw samples from these defined neighborhoods. This strategic approach ensures that the sampled latent vectors predominantly inhabit high-probability regions, which, in turn, can be effectively transformed into authentic real-world images. A noteworthy point of comparison for our sampling algorithm is the sampling technique based on Gaussian mixture models (GMM), owing to its inherent capability to represent clusters. Remarkably, we manage to improve the time complexity from the previous $\mathcal{O}(n\times d \times k \times i)$ associated with GMM sampling to a much more streamlined $\mathcal{O}(n\times d)$, thereby resulting in substantial speedup during runtime. Moreover, our experimental results, gauged through the Fr\'echet inception distance (FID) for image generation, underscore the superior performance of our sampling algorithm across a diverse range of models and datasets. On the MNIST benchmark dataset, our approach outperforms GMM sampling by yielding a noteworthy improvement of up to $0.89$ in FID value. Furthermore, when it comes to generating images of faces and ocular images, our approach showcases substantial enhancements with FID improvements of $1.69$ and $0.87$ respectively, as compared to GMM sampling, as evidenced on the CelebA and MOBIUS datasets. Lastly, we substantiate our methodology's efficacy in estimating latent space distributions in contrast to GMM sampling, particularly through the lens of the Wasserstein distance.
翻译:在本研究中,我们聚焦于基于自编码器的生成模型潜空间采样,以使重建样本达到逼真图像的效果。为此,我们提出了一种基于概率质量函数概念并结合量化过程的新型后训练采样算法。该算法在输入数据的每个潜在向量周围建立邻域,并从这些定义的邻域中抽取样本。这种策略性方法确保采样的潜在向量主要分布于高概率区域,从而能够有效转化为逼真的真实世界图像。我们采样算法的一个关键比较对象是基于高斯混合模型(GMM)的采样技术,因其天然具备表示聚类的能力。值得注意的是,我们将时间复杂度从先前GMM采样的$\mathcal{O}(n\times d \times k \times i)$显著优化至更精简的$\mathcal{O}(n\times d)$,从而在运行时实现大幅加速。此外,通过用于图像生成的弗雷歇起始距离(FID)进行衡量的实验结果表明,我们的采样算法在多种模型和数据集上均展现出卓越性能。在MNIST基准数据集上,我们的方法比GMM采样实现了高达$0.89$的FID值改进。在面部图像和眼部图像生成方面,我们的方法在CelebA和MOBIUS数据集上相比GMM采样分别展现出$1.69$和$0.87$的显著FID提升。最后,我们通过Wasserstein距离验证了本方法在估计潜空间分布方面相较于GMM采样的有效性。