Irregular distribution in latent space causes posterior collapse, misalignment between posterior and prior, and ill-sampling problem in Variational Autoencoders (VAEs). In this paper, we introduce a novel adaptable three-stage Uniform Transformation (UT) module -- Gaussian Kernel Density Estimation (G-KDE) clustering, non-parametric Gaussian Mixture (GM) Modeling, and Probability Integral Transform (PIT) -- to address irregular latent distributions. By reconfiguring irregular distributions into a uniform distribution in the latent space, our approach significantly enhances the disentanglement and interpretability of latent representations, overcoming the limitation of traditional VAE models in capturing complex data structures. Empirical evaluations demonstrated the efficacy of our proposed UT module in improving disentanglement metrics across benchmark datasets -- dSprites and MNIST. Our findings suggest a promising direction for advancing representation learning techniques, with implication for future research in extending this framework to more sophisticated datasets and downstream tasks.
翻译:潜在空间中的不规则分布会导致变分自编码器(VAEs)出现后验坍塌、后验与先验失配以及采样困难等问题。本文提出了一种新颖的自适应三阶段均匀变换(UT)模块——高斯核密度估计(G-KDE)聚类、非参数高斯混合(GM)建模及概率积分变换(PIT)——以解决潜在分布不规则的问题。通过将潜在空间中的不规则分布重构为均匀分布,我们的方法显著提升了潜在表示的解耦性与可解释性,克服了传统VAE模型在捕捉复杂数据结构方面的局限性。实证评估表明,我们提出的UT模块在基准数据集(dSprites和MNIST)上有效提升了解耦度量指标。本研究结果为推进表示学习技术提供了有前景的方向,并为未来将该框架扩展至更复杂数据集及下游任务的研究提供了启示。